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Abstract 

Matrix completion has attracted significant recent attention in many fields including statis¬ 
tics, applied mathematics and electrical engineering. Current literature on matrix completion 
focuses primarily on independent sampling models under which the individual observed entries 
are sampled independently. Motivated by applications in genomic data integration, we propose 
a new framework of structured matrix completion (SMC) to treat structured missingness by 
design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the 
rows and columns of an approximately low-rank matrix are observed. We provide theoretical 
justification for the proposed SMC method and derive lower bound for the estimation errors, 
which together establish the optimal rate of recovery over certain classes of approximately low- 
rank matrices. Simulation studies show that the method performs well in finite sample under 
a variety of configurations. The method is applied to integrate several ovarian cancer genomic 
studies with different extent of genomic measurements, which enables us to construct more 
accurate prediction rules for ovarian cancer survival. 
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1 Introduction 


Motivated by an array of applications, matrix completion has attracted significant recent 
attention in different fields including statistics, applied mathematics and electrical engineering. 
The central goal of matrix completion is to recover a high-dimensional low-rank matrix based 


on a subset of its entries. Applications include recommender systems (Koren et al. 2009), 


genomics (Chi et al. 2013), multi-task learning (Argyriou et ah, 2008), sensor localization 


(Biswas et al. 

2006; 

Singer and Cucuringu 

2010 

), and computer vision ( 

Chen and Suter 


2004 Tomasi and Kanade, 1992), among many others 


Matrix completion has been well studied under the uniform sampling model, where ob¬ 
served entries are assumed to be sampled uniformly at random. The best known approach is 
perhaps the constrained nuclear norm minimization (NNM), which has been shown to yield 


near-optimal results when the sampling distribution of the observed entries is uniform (Candes 


and Recht, 2009 Candes and Tao ( 2010 Gross, 2011 Recht, 2011; Candes and Plan 2011). 


For estimating approximately low-rank matrices from uniformly sampled noisy observations, 
several penalized or constrained NNM estimators, which are based on the same principle as 
the well-known Lasso and Dantzig selector for sparse signal recovery, were proposed and an¬ 


alyzed (Keshavan et al. 2010 Mazumder et al. 2010 Koltchinskii, 2011; Koltchinskii et al. 


2011 Rohde et ah, 2011). In many applications, the entries are sampled independently but 

showed that the standard 


not uniformly. In such a setting, Salakhutdinov and Srebro 


NNM methods do not perform well, and proposed a weighted NNM method, which depends 


on the true sampling distribution. In the case of unknown sampling distribution, Foygel et al 


(2011) introduced an empirically-weighted NNM method. Cai and Zhou (2013) studied a max- 


norm constrained minimization method for the recovery of a low-rank matrix based on the 
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noisy observations under the non-uniform sampling model. It was shown that the max-norm 
constrained least squares estimator is rate-optimal under the Frobenius norm loss and yields 
a more stable approximate recovery guarantee with respect to the sampling distributions. 

The focus of matrix completion has so far been on the recovery of a low-rank matrix based 
on independently sampled entries. Motivated by applications in genomic data integration, 
we introduce in this paper a new framework of matrix completion called structured matrix 
completion (SMC), where a subset of the rows and a subset of the columns of an approximately 
low-rank matrix are observed and the goal is to reconstruct the whole matrix based on the 
observed rows and columns. We first discuss the genomic data integration problem before 
introducing the SMC model. 

1.1 Genomic Data Integration 

When analyzing genome-wide studies (GWS) of association, expression profiling or methy- 
lation, ensuring adequate power of the analysis is one of the most crucial goals due to the 
high dimensionality of the genomic markers under consideration. Because of cost constraints, 
GWS typically have small to moderate sample sizes and hence limited power. One approach 
to increase the power is to integrate information from multiple GWS of the same phenotype. 
However, some practical complications may hamper the feasibility of such integrative analysis. 
Different GWS often involve different platforms with distinct genomic coverage. For example, 
whole genome next generation sequencing (NGS) studies would provide mutation information 
on all loci while older technologies for genome-wide association studies (GWAS) would only 
provide information on a small subset of loci. In some settings, certain studies may provide 
a wider range of genomic data than others. For example, one study may provide extensive 
genomic measurements including gene expression, miRNA and DNA methylation while other 
studies may only measure gene expression. 

To perform integrative analysis of studies with different extent of genomic measurements, 
the naive complete observation only approach may suffer from low power. For the GWAS 
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setting with a small fraction of loci missing, many imputation methods have been proposed 
in recent years to improve the power of the studies. Examples of useful methods include 
haplotype reconstruction, h-nearest neighbor, regression and singular value decomposition 


methods (Scheet and Stephens 2006; Li and Abecasis 2006 Browning and Browning, 2009 


Troyanskaya et ah, 2001 Kim et al. 2005; Wang et ah, 2006). Many of the haplotype phasing 


methods are considered to be highly effective in recovering missing genotype information 


(Yu and Schaid 2007). These methods, while useful, are often computationally intensive. 
In addition, when one study has a much denser coverage than the other, the fraction of 
missingness could be high and an exceedingly large number of observation would need to 
be imputed. It is unclear whether it is statistically or computationally feasible to extend 
these methods to such settings. Moreover, haplotype based methods cannot be extended to 
incorporate other types of genomic data such as gene expression and miRNA data. 

When integrating multiple studies with different extent of genomic measurements, the 
observed data can be viewed as complete rows and columns of a large matrix A and the 
missing components can be arranged as a submatrix of A. As such, the missingness in A is 
structured by design. In this paper, we propose a novel SMC method for imputing the missing 
submatrix of A. As shown in Section [5j by imputing the missing miRNA measurements and 
constructing prediction rules based on the imputed data, it is possible to significantly improve 
the prediction performance. 


1.2 Structured Matrix Completion Model 

Motivated by the applications mentioned above, this paper considers SMC where a subset of 
rows and columns are observed. Specifically, we observe rri \ < p\ rows and m 2 < P 2 columns 
of a matrix A 6 M piXp2 and the goal is to recover the whole matrix. Since the singular values 
are invariant under row/column permutations, it can be assumed without loss of generality 
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that we observe the first m\ rows and m 2 columns of A which can be written in a block form: 



m 2 

to 

1 

to 


A = 

An 

Al 2 

mi 


to 

A 22 

Pi — nil 


( 1 ) 


where A n , A 12 , and A 2 1 are observed and the goal is to recover the missing block A 22 . See 
Figure [IJa) in Section [ 2 ] for a graphical display of the data. Clearly there is no way to 
recover A 22 if A is an arbitrary matrix. However, in many applications such as genomic 
data integration discussed earlier, A is approximately low-rank, which makes it possible to 
recover A 22 with accuracy. In this paper, we introduce a method based on the singular value 
decomposition (SVD) for the recovery of A 22 when A is approximately low-rank. 

ft is important to note that the observations here are much more “structured” comparing 
to the previous settings of matrix completion. As the observed entries are in full rows or 
full columns, the existing methods based on NNM are not suitable. As mentioned earlier, 
constrained NNM methods have been widely used in matrix completion problems based on 
independently observed entries. However, for the problem considered in the present paper, 
these methods do not utilize the structure of the observations and do not guarantee precise 
recovery even for exactly low-rank matrix A (See Remark [I] in Section [ 2 ]). Numerical results 
in Section [4] show that NNM methods do not perform well in SMC. 

In this paper we propose a new SMC method that can be easily implemented by a fast 
algorithm which only involves basic matrix operations and the SVD. The main idea of our 
recovery procedure is based on the Schur Complement. I 11 the ideal case when A is exactly low 
rank, the Schur complement of the missing block, A^ — A^A^A^, is zero and thus A 2 iA\ 1 Ai 2 
can be used to recover A 22 exactly. When A is approximately low rank, A 2 \A} X1 A\ 2 cannot 
be used directly to estimate A 22 . For this case, we transform the observed blocks using SVD; 
remove some unimportant rows and columns based on thresholding rules; and subsequently 
apply a similar procedure to recover A 22 . 

Both its theoretical and numerical properties are studied. It is shown that the estima- 
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tor recovers low-rank matrices accurately and is robust against small perturbations. A lower 
bound result shows that the estimator is rate optimal for a class of approximately low-rank 
matrices. Although it is required for the theoretical analysis that there is a significant gap 
between the singular values of the true low-rank matrix and those of the perturbation, simula¬ 
tion results indicate that this gap is not really necessary in practice and the estimator recovers 
A accurately whenever the singular values of A decay sufficiently fast. 


1.3 Organization of the Paper 


The rest of the paper is organized as follows. In Section [2j we introduce in detail the proposed 
SMC methods when A is exactly or approximately low-rank. The theoretical properties of the 
estimators are analyzed in Section [3] Both upper and lower bounds for the recovery accuracy 
under the Schatten-g norm loss are established. Simulation results are shown in Section [4] to 
investigate the numerical performance of the proposed methods. A real data application to 
genomic data integration is given in Section [5] Section [6] discusses a few practical issues related 
to real data applications. For reasons of space, the proofs of the main results and additional 


simulation results are given in the supplement (Cai et al. 2014). Some key technical tools 


used in the proofs of the main theorems are also developed and proved in the supplement. 


2 Structured Matrix Completion: Methodology 

In this section, we propose procedures to recover the submatrix A 22 based on the observed 
blocks An, A 12 , and A 2 1 . We begin with basic notation and definitions that will be used in 
the rest of the paper. 

For a matrix U, we use U[n lt a 2 ] represent its sub-matrix with row indices hli and column 
indices . We also use the Matlab syntax to represent index sets. Specifically for integers a < 
b, “a : 6” represents {a, a + 1, • • • , 5}; and alone represents the entire index set. Therefore, 
U[ : , i ;r ] stands for the first r columns of U while U^ mi+ i) :pi , : ] stands for the {mi + l,...,pi} t/l 
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rows of U. For the matrix A given in (JTJ), we use the notation A ,i and Ai. to denote [A J U , ^2l] T 
and [. Au,A 12 \, respectively. For a matrix B G M mxn , let B = UTjV j = a l (B) u l vJ be the 
SVD, where S = diag{cri(i?), a 2 (B ),...} with &\(B) > a 2 (B) > ■■■ > 0 being the singular 
values of £> in decreasing order. The smallest singular value cr min ( miTl ), which will be denoted 
by cr min (-B), plays an important role in our analysis. We also define -B max (r) = YH=i a i(B)uivJ 
and max(r) = B- B max ( r .) = Xu>r +1 Ui{B)uiV}. For 1 < q < oo, the Schatten-g norm ||5|| g is 
defined to be the vector g-norrn of the singular values of B , i.e. ||-B|| g = (JT a^B)) 1 ^. Three 
special cases are of particular interest: when q = 1, ||5||i = JT a i(B) is the nuclear (or trace) 
norm of B and will be denoted as ||-B||*; when q = 2, ||-E >||2 = the Frobenius norm 

of B and will be denoted as ||-B||f; when q = oo, ||-£>||oo = cp (-B) is the spectral norm of B that 
we simply denote as ||5||. For any matrix U G M pxn , we use Py = U (U J Uy U J G M pxp to 
denote the projection operator onto the column space of U. Throughout, we assume that A is 
approximately rank r in that for some integer 0 < r < min(mi, m 2 ), there is a significant gap 
between a r (A) and a r+ \{A) and the tail ||A_ m ax(r)||g = (Xk->r+i a l (W is small. The gap 
assumption enables us to provide a theoretical upper bound on the accuracy of the estimator, 
while it is not necessary in practice (see Section [4] for more details). 

2.1 Exact Low-rank Matrix Recovery 

We begin with the relatively easy case where A is exactly of rank r. In this case, a simple 
analysis indicates that A can be perfectly recovered as shown in the following proposition. 

Proposition 1 Suppose A is of rank r, the SVD of An is A n = U EE T , where U G M PlXr , £ G 
M rxr , and V G M P2Xr . If 

rank ([A] j A 12 ]) = rank ^ 

then rank (An) = r and A 22 is exactly given by 

A 22 = A 2 i(An)^Ai2 = A 2 iV(Tl) 1 U 1 A\ 2 . (2) 


An 

A-21 


= rank(A) = r, 
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( 3 ) 


Remark 1 Under the same conditions as Proposition [lj the NNM 

j An A12 

A 22 = argmin 

b K B 

fails to guarantee the exact recovery of A 22 - Consider the case where A is a pi x p 2 matrix 
with all entries being 1. Suppose we observe arbitrary mi rows and m 2 columns, the NNM 
would yield A 2 2 € M( pi_mi ) x ( p2_T712 ) with all entries being ^1 A ( Pl _,™ 1 1 )(pg_ m2 ) ^ (See Lemma 
[f] in the Supplement). Hence when mim 2 < (p\ — rn |) (p 2 — in 2 ), he., when the size of the 
observed blocks are much smaller than that of A, the NNM fails to recover exactly the missing 
block A 22 . See also the numerical comparison in Section 0 The NNM ([3]) also fails to recover 
A 22 with high probability in a random matrix setting where A = BiBj with Bi G M PlXr 
and B 2 G M P2Xr being i.i.d. standard Gaussian matrices. See Lemma [3] in the Supplement 
for further details. I 11 addition to (J3|, other variations of NNM have been proposed in the 


literature, including penalized NNM (Toll and Yun 2010 Mazumder et ah, 2010), 


A PN = argmin l - ( z ikJk ~ + *11^1 


( 4 ) 


(ik 5 


and constrained NNM with relaxation (|Cai et ah, 2010), 


A = argmin{||Z||* : \Z ikJk - A ikJk \ < t for (i k ,j k ) G G} , 
z 


( 5 ) 


where G = {(i k ,jk) '■ A i k j k observed, 1 < i k < Pi, 1 < jk < P 2 } and t is the tunning parameter. 
However, these NNM methods may not be suitable for SMC especially when only a small 
number of rows and columns are observed. In particular, when m\ pi,m 2 <C p 2 , A is well 
spread in each block A n , A 12 , A 2 i, A 22 , we have ||[A n Ai 2 ]||* < ||A||*, [A i2 ]* -C ||A||*. Thus, 



An 

A12 




An 



r 



An 

A12 






< 




+ 

A12 

•c 






A21 

0 


* 


A21 


* 


* 


A21 

1 

CN 

CM 



I 11 the other words, imputing A 2 2 with all zero yields a much smaller nuclear norm than 
imputing with the true A 22 and hence NNM methods would generally fail to recover A 22 
under such settings. 














































Proposition [l] shows that, when A is exactly low-rank, A 22 can be recovered precisely by 
A 2 i(A 11 yA 12 . Unfortunately, this result heavily relies on the exactly low-rank assumption 
that cannot be directly used for approximately low-rank matrices. In fact, even with a small 
perturbation to A, the inverse of An makes the formula ^ 21 (^ 11)^12 unstable, which may 
lead to the failure of recovery. In practice, A is often not exactly low rank but approximately 
low rank. Thus for the rest of the paper, we focus on the latter setting. 

2.2 Approximate Low-rank Matrix Recovery 

Let A = UT,V J be the SVD of an approximately low rank matrix A and partition U 6 
M PlXpi , V e W 2Xp2 and £ 6 M PlXp2 into blocks as 



r 

Pi — r 



r 

^3 

to 

1 



r 

^3 

to 

1 

~3 

u = 

Un 

u 12 

mi 

v = 

Vn 

1 

CN 

m 2 

£ = 

£1 

0 


U 2 1 

U 22 

Pi — mi 


V 2 i 

CN 

P-2 ~ m 2 


0 

s 2 


( 6 ) 


Pi — r 


Then A can be decomposed as A = H max ( r ) + H_ max ( r ) where H max ( r ) is of rank r with the 
largest r singular values of A and H_ max ( r ) is general but with small singular values. Then 


A 


max(r) 


= [/,ET. T , = 


rn 2 p 2 ~ m 2 

tfuIuKi U u ^iV 2 \ 
U 2 & 1 V& U 2X Y, X V.] X 


, and H_ max(r) = U. 2 £ 2 U.V (7) 

Pi — mi 


Here and in the sequel, we use the notation U,/,. and 74. to denote [Uj k , U. J fc ] T and [L4 1 , L4 2 ], 
respectively. Thus, H max ( r ) can be viewed as a rank-r approximation to A and obviously 

U 2 iXiV 2 \ = {UziZiViljiUiiXiVhj-'iUiiXiVli}. 

We will use the observed An, A i2 and A 21 to obtain estimates of U, 1 , V,\ and £1 and subse¬ 
quently recover A 22 using an estimated U 2X E X V 2 T X . 
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When r is known, i.e., we know where the gap is located in the singular values of A, a 
simple procedure can be implemented to estimate A 22 as described in Algorithm 1 below by 
estimating U,\ and V,i using the principal components of A, 1 and Ai.. 

Algorithm 1 Algorithm for Structured Matrix Completion with a given r 
1: Input: An G M miXm2 , Ai 2 G , A 21 G 

2: Calculate the SVD of A, 1 and A lt to obtain A, 1 = U^E^V ^ * 1 , A u = U^E^V^ 1 . 

3: Suppose M,N are orthonormal basis of Uu,VA- We estimate the column space of U u 
and V u by M = U^ :r] ,N = V$ r] . 

4: Finally we estimate A 22 as 

A 22 = A 21 N(M t A 11 N)~ 1 M t A 12 . (8) 


However, Algorithm 1 has several major limitations. First, it relies on a given r which is 
typically unknown in practice. Second, the algorithm need to calculate the matrix divisions, 
which may cause serious precision issues when the matrix is near-singular or the rank r is 
mis-specihed. To overcome these difficulties, we propose another Algorithm which essentially 
first estimates r with f and then apply Algorithm 1 to recover A 22 . Before introducing the 
algorithm of recovery without knowing r, it is helpful to illustrate the idea with heat maps in 
Figures [T| and [2] 

Our procedure has three steps. 

1. First, we move the significant factors of A, 1 and A\, to the front by rotating the columns 
of A, 1 and the rows of A u based on the SVD, 

A.! = f/ (1) S (1) V (1)T , Ai. = U { 2 )£ (2) V (2)t . 

After the transformation, we have Zu, Z V2 , Z 2X , 

Z\\ = U^A U V^\ Z 12 = U^A 12 , Z 21 = A 21 V^\ Z 22 = A 22 . 
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A 11 


A 12 


Z 11 


Z 12 



(a) heatmap of block-wise A 


Figure 1: Illustrative example with A E M 30x 
to larger magnitude.) 



(b) heatmap of block-wise Z after rotation 

3 , m i = m 2 = 10. (A darker block corresponds 


Z_ll[l:9, 1:9] Z_12[l:9, :] 


Z_ll[l:4, 1:4] Z_12[l:4, :] 



(a) Intermediate step when f = 9 


(b) Identify the position to truncate at f = 4 


Figure 2: Searching for the appropriate position to truncate from r = 10 to 1. 
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Clearly A and Z have the same singular values since the transformation is orthogonal. As 
shown in Figure [jjb), the amplitudes of the columns of Z 9l = [Zj 1} Z^] 1 and the rows of 
Z i. = [Z n , Z 12 ] are decaying. 

2. When A is exactly of rank r, the {r + 1, • • • , m 1 } tt rows and {r + 1, • • • , 7n 2 } th columns of 
Z are zero. Due to the small perturbation term A_ max ( r ), the back columns of Z,\ and rows 
of Z i. are small but non-zero. In order to recover A max( y), the best rank r approximation 
to A, a natural idea is to first delete these back rows of Z t , and columns of Z ml , i.e. the 
{r + 1, • • • , m\} th rows and {r + 1, • • • , 7n 2 } th columns of Z. 

However, since r is unknown, it is unclear how many back rows and columns should be 
removed. It will be helpful to have an estimate for r, f, and then use ^ 2 i,[ : ,i:r]) -Zn,[i:f,i:r] 
and -Zi 2 [i:rv] t° recover A 22 . It will be shown that a good choice of f would satisfy that 
Zn,[i:f,i:f\ is non-singular and \\Z 2 i^.f^Z^, vf j..] || < Tr, where Tr is some constant to be 
specihed later. Our final estimator for r would be the largest r that satisfies this condition, 
which can be identified recursively from 111111 ( 771 !, m 2 ) to 1 (See Figure [2j) . 

3. Finally, similar to (j2]), A 22 can be estimated by 

A 22 = -^21,[:,l:f] , 1 : r] ^12,[l:f,:]) ( 9 ) 

The method we propose can be summarized as the following algorithm. 


12 


Algorithm 2 Algorithm of Structured Matrix Completion with unknown r 
1: Input: An e M miXm2 , A^ lX ^ P2_m2 \ Thresholding level: T Rl (or Ter). 

2: Calculate the SVD A.i = j/MsMyah Ai. = 

3: Calculate Z u e R miXm2 ,Z 12 e R m ^ 2 ~ m2 \Z 21 e M (pi " mi ) xm2 

Z n = t/ (2)T AnyW, Z 12 = t/ (2)T Ai 2 , Z 21 = A 21 U (1 ). 

4: for s = min(mi,m 2 ) : -1: 1 do (Use iteration to find f) 

5: Calculate D Rs e ]^0 1_mi )xs ( or D Cs e M. sx( - P2 ~ m2 ' > ) by solving linear equation system, 

Dr )S = Z 2l,[.,l-.s]Z~l [1 . Stl . s] (or D C ,s = ^l"p[l: S ,l: S ]^12,[l: S ,:]) 

6: if Z Ut [i :Sjl:s ] is not singular and \\D RjS \\ < T R ( or \\D C , S \\ < T c ) then 

7: r = s; break from the loop; 

8: end if 

9: end for 

10: if (r is not valued) then f — 0. 

11: end if 

12: Finally we calculate the estimate as 

A 22 = ^21,[:,l:r]^ryi :f;) i : q^l2,[l:f,:] 


It can also be seen from Algorithm [2] that the estimator f is constructed based on either the 
row thresholding rule ||-Dr, s || < T R or the column thresholding rule ||-Dc,s|| < Tq. Discussions 
on the choice between D R}S and Dc, s are given in the next section. Let us focus for now 
on the row thresholding based on D Rs It is important to note that 

Z 2 i[ :i i :r ] and Z u ^i, r l . r j approximate U 2 {Zi and Ex, respectively. The idea behind the proposed 
r is that when s > r, Z 2 i[ : ,i :s ] and Zn^s^-^ are nearly singular and hence D R , S may either be 
deemed singular or with unbounded norm. When s = r, -Zn,[i:s,i:s] is non-singular with ||Dr iS || 
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bounded by some constant, as we show in Theorem [2] Thus, we estimate f as the largest r 
such that Z lltl i:s,i:s] is non-singular with ||-D_r, s || < T R . 


3 Theoretical Analysis 


In this section, we investigate the theoretical properties of the algorithms introduced in Section 
[2j Upper bounds for the estimation errors of Algorithms 1 and 2 are presented in Theorems 
[l] and [2j respectively, and the lower-bound results are given in Theorem |3j These bounds 
together establish the optimal rate of recovery over certain classes of approximately low-rank 
matrices. The choices of tuning parameters T R and T c are discussed in Corollaries [l] and [2} 


Theorem 1 Suppose A is given by the procedure of Algorithm 1. Assume 

1 


cr r +i{A) < -ay (A) ■ ^min (U n ) ^min (Vii), 


( 10 ) 


Then for any 1 < q < oo, 
A 22 — A 22 


— 3 || A _ max ( r )||<2 I 1 + ^ 


1 + 


OminTii} 


( 11 ) 


Remark 2 It is helpful to explain intuitively why Condition (10) is needed. When A is 


approximately low-rank, the dominant low-rank component of A, A max ( r ), serves as a good 
approximation to A, while the residual A_ max(r ) is “small”. The goal is to recover A max ( r ) 
well. Among the three observed blocks, An is the most important and it is necessary to have 
^maxp) dominating A_ max ( r ) in An- Note that A max ( r ),[i : m 1 .i:m 2 ] T A_ max p) [i :mi 


( A max p)Ji :mi J l:rri2 ]) (J r (I/nSiUii) ^ 0"min(Cn)(T r (A)(T m i n (I / il)5 


|| A_ max p) i [l : j7i lj l : 77i2] II 11 252 22 11 A 0"r+l(A). 

We thus require Condition (10) in Theorem [I] for the theoretical analysis. 


Theorem [T| gives an upper bound for the estimation accuracy of Algorithm 1 under the 
assumption that there is a significant gap between cr r (A) and cr r+1 (A) for some known r. It is 
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noteworthy that there are possibly multiple values of r that satisfy Condition (10). In such a 


case, the bound (11) applies to all such r and the largest r yields the strongest result. 

We now turn to Algorithm [2j where the knowledge of r is not assumed. Theorem [2] below 
shows that for properly chosen Tr or Tc, Algorithm 2 can lead to accurate recovery of ^ 22 - 


Theorem 2 Assume that there exists r 6 [1, min(mi, m 2 )] such that 


o- r+ i(A) < -a r (A ) ■ 

^min (C/ 11 ) 

^min (Ki). 


Let Tr and Tc be two constants satisfying 

1.36 


Tr> 


+ 0.35 and Tc > 


1.36 


^min (C,,) 

^min (kii) 

Then for 1 < q < 00 , A 2 2 given by Algorithm 2 satisfies 

1 


+ 0.35. 


or 


A 22 — A 2 2 

A22 — A 2 2 


< 6.5 Tr 

<? ytXmjnl+l 1 ) 


< 6.5 T c 

<? V <J minfOnJ 


+ 1 I ||A_ max ( r )| 


+ 1 ) 11 ^ 4 — max(r) | 


( 12 ) 


(13) 


when r is estimated based on the thresholding rule ||C 1 k,s|| < Tr or ||-Dc,s|| < Tc, respectively. 


Besides a r (A ) and a r+ i(A ), Theorems [T] and [ 2 ] involve cr min (f/ii) and a m \ n (Vn), two impor¬ 
tant quantities that reflect how much the low-rank matrix A max (+ = U, iSiV^ is concentrated 
on the first mi rows and m 2 columns. We should note that cr min (f/n) and cr m i n (Vii) depend on 
the singular vectors of A and oy(A) and a r+ i(A) are the singular values of A. The lower bound 
in Theorem |3] below indicates that cr m in(Cn ), cr m i n (kii), and the singular values of A together 
quantify the difficulty of the problem: recovery of A 22 gets harder as a m i n ([/n) and cr m i n (Vn) 
become smaller or the {r + 1, • • • , mm(pi,p 2 )} th singular values become larger. Define the 
class of approximately rank-r matrices J r r (M 1 ,M 2 ) by 


F r {M u M 2 ) 


A e 


0 1 Xp 2 


^min(1) > Mi, <T min (kil) > M 2 , 

cr r +i(A) < \a r {A ) 

^min (i U n ) ^min (Vii) 


(14) 
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Theorem 3 (Lower Bound) Suppose r < min(mi, m 2 ,pi — mi,P 2 — m 2 ) and 0 < Mi, M 2 < 
1 , then for all 1 < q < 00 , 


Remark 3 Theorems [lj [2] and [3] together immediately yield the optimal rate of recovery over 
the class M 2 ), 



inf sup 

A 22 AdJ r r (M\,M 2 ) 11 A— max(r) 11 q 


\\A-22 - - ^-221|g > 1 / J_ 1 


4 V Mi 


inf sup 

A22 UeJ r r (Mi,M 2 ) 


||A 


22 


A 


22 


11 A— max(r) | 




for 0 < M| , M 2 < 1 , 1 < q < 00 . 

(16) 


Since U\\ and Vn are determined by the SVD of A and cr m i n (C/n) and a m h-, (Vn) are unknown 
based only on An, A\ 2 , and A 2 1 , it is thus not straightforward to choose the tuning parameters 
Tr and Tc in a principled way. Theorem [2] also does not provide information on the choice 
between row and column thresholding. Such a choice generally depends on the problem 
setting. We consider below two settings where either the row/colnmns of A are randomly 
sampled or A is itself a random low-rank matrix. In such settings, when A is approximately 
rank r and at least O(rlogr) number of rows and columns are observed, Algorithm [2] gives 
accurate recovery of A with fully specified tuning parameter. We first consider in Corollary [l] 
a fixed matrix A with the observed mi rows and m 2 columns selected uniformly randomly. 


Corollary 1 (Random Rows/Columns) Let A = UTjV j be the SVD of A e M PlXp2 . Set 

r r 

Pi 


W r (1) = — max V Ul and W r (2) 
r r i<i<n ^ 3 

3 = 1 


P2 t/2 

= — max > 14 

r i<i<P 2 3 
3 =1 


(17) 


Let f2i C {1, • • • ,Pi} and h2 2 C {1, • ■ • ,p 2 } be respectively the index set of the observed nil 
rows and m 2 columns. Then A can be decomposed as 


An — A 2 i — An — A 22 — A^^. 


(18) 


1. Let Oi and h2 2 be independently and uniformly selected from {1, • • • ,pi} and {1, • • • ,p 2 } 
with or without replacement, respectively. Suppose there exists r < min(mi,m 2 ) such that 


, A s ^ 1 rmm 2 

a'r+i(A) ^ a r (A). 

6 V P1P2 
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and the number of rows and number of columns we observed satisfy 

m i > 12.5rW'J 1 ^(log(r) + c), m 2 > 12.5rW® (log(r) + c), for some constant c > 1. 

Algorithm 2 with either column thresholding with the break condition ||.Dr )S || < T R where 
Tfi = 2 y / ^- or row thresholding with the break condition ||-Dc,s|| < T c where Tq = 2^f^ 
satisfies, for all 1 < q < oo, 


\\A 2 2 — A 2 2 ||(jr < 29||7l_ max ( r )| 


P 1 P 2 
mi m 2 


with probability > 1 — 4exp(—c). 


2. If f2i is uniformly randomly selected from {1, • • • ,pi} with or without replacement (Q 2 is 
not necessarily random), and there exists r < m 2 such that 

1 / Tft\ 

^"r+l(^4) 5; —0' r (A)(Tmin(^ / ll) \ / 

5 \ p 1 

and the number of observed rows satisfies 

mi > 12.5 rW) 1 ' 1 (log(r) + c) for some constant c > 1, (19) 

then Algorithm 2 with the break condition ||£ ) /?, s || < Tr where Tr > 2satisfies, for all 
1 < q < 00 , 

1 


A 22 — A 22 


— 6.5||A_ max ( r )||gTR . 

Q \ O’minl. ml J 


+ 1 with probability > 1 — 2exp(—c). 


3. Similarly, if ^2 is uniformly randomly selected from {1, • • • ,p 2 } with or without replacement 
(T2i is not necessarily random) and there exists r < m 2 such that 

&r+ 1(-'4) 5: “CT r (j4)(Jmin(^ll) \ / ) 

5 V P ‘2 

and the number of observed columns satisfies 

m 2 > 12.5 rW) 2 ^ (log(r) + c) for some constant c > 1, (20) 

then Algorithm 2 with the break condition ||-Dc,<;|| < Tc where Tc > 2\f^ satisfies, for all 


1 < q < 00 , 

\A -22 — A 2 2 


— 6-5 || A_ max (r) 1 / T t \ 

1 V^minl^ll, 


+ 1 ) with probability > 1 — 2exp(—c). 
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Remark 4 The quantities wj . 1 ^ and Hv' 2) in Corollary [l] measure the variation of amplitude 
of each row or each column of Anax(r)- When and become larger, a small number 
of rows and columns in H max ( r ) would have larger amplitude than others, while these rows 
and columns would be missed with large probability in the sampling of 0, which means the 
problem would become harder. Hence, more observations for the matrix with larger and 


(o) 

Wr are needed as shown in (19). 


We now consider the case where A is a random matrix. 

Corollary 2 (Random Matrix) Suppose A £ M PlXp2 is a random matrix generated by A = 
UT,V J , where the singular values £ and singular space V are fixed, and U has orthonormal 
columns that are randomly sampled based on the Haar measure. Suppose we observe the first 
mi rows and first m 2 columns of A. Assume there exists r < | min(mi, m 2 ) such that 


v r +i(A) < -a r (A)a min (V n) A /— 

5 \ Pi 

Then there exist uniform constants c, 8 > 0 such that if mi > cr, A 2 2 is given by Algorithm 
with the break condition ||-D# jS || < Tr, where Tr > we have for all 1 < q < 00 , 

1 


A 22 — A 


22 


< 6.5||H_ max(r )|| g T R 


+ 1 


with probability at least 1 — e 


—Smi 


Q \ OlninfCn ) 

Parallel results hold for the case when U is fixed and V has orthonormal columns that are 
randomly sampled based on the Haar measure, and we observe the first mi rows and first m 2 
columns of A. Assume there exists r < 1 111111 ( 771 !, m 2 ) such that 


v r +i(A) < -a r (A)a min (Un) x f^. 

5 \ p2 

Then there exist unifrom constants c, S > 0 such that if m 2 > cr, A 22 is given by Algorithm 2 
with column thresholding with the break condition ||.Dc,s|| < Tc, where Tq > we have 

for all 1 < q < 00 , 

1 


A 22 — A 


22 


< 6.5||H_ max ( r )|| 9 Tc . , s 

9 \ CT m in f U1 1J 


+ 1 


with probability at least 1 — e 5m2 . 
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4 Simulation 


In this section, we show results from extensive simulation studies that examine the numerical 
performance of Algorithm [2] on randomly generated matrices for various values of p \, p 2 , m i 
and m 2 . We first consider settings where a gap between some adjacent singular values exists, 
as required by our theoretical analysis. Then we investigate settings where the singular values 
decay smoothly with no significant gap between adjacent singular values. The results show 
that the proposed procedure performs well even when there is no significant gap, as long as 
the singular values decay at a reasonable rate. 

We also examine how sensitive the proposed estimators are to the choice of the threshold 
and the choice between row and column thresholding. In addition, we compare the perfor¬ 
mance of the SMC method with that of the NNM method. Finally, we consider a setting 
similar to the real data application discussed in the next section. Results shown below are 
based on 200-500 replications for each configuration. Additional simulation results on the ef¬ 
fect of mi, m 2 and ratio pi/mi are provided in the supplement. Throughout, we generate the 
random matrix A from A = UHV, where the singular values of the diagonal matrix E are cho¬ 
sen accordingly for different settings. The singular spaces U and V are drawn randomly from 
the Haar measure. Specifically, we generate i.i.d. standard Gaussian matrix U G jj£pixmin(pi,p 2 ) 
and V G MP 2Xmm h )1 >P 2 ) ) then apply the QR decomposition to U and V and assign U and V 
with the Q part of the result. 

We first consider the performance of Algorithm [ 2 ] when a significant gap between the r th 
and (r + l) th singular values of A. We fixed Pi = P 2 = 1000, mi = m 2 = 50 and choose the 
singular values as 



r 




S' = 1 , 2 , • • • , 10 , 


r = 4,12 and 20. 


( 21 ) 


Here r is the rank of the major low-rank part A max (r), 9 = f+^pA) ^ ie S a P ratio between the 
r th and (r + l) th singular values of A. The average loss of A 22 from Algorithm 2 with the row 
thresholding and T R — 2y / pi/m 1 under both the spectral norm and Frobenius norm losses are 
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given in Figure [3j The results suggest that our algorithm performs better when r gets smaller 
and gap ratio g = a r (A) / a r +i(A) gets larger. Moreover, even when g — 1, namely there is no 
significant gap between any adjacent singular values, our algorithm still works well for small 
r. As will be seen in the following simulation studies, this is generally the case as long as the 
singular values of A decay sufficiently fast. 



Gap ratio (c(A) / c +1 (A)) 


Gap ratio (o(A)/c +1 (A)) 


Figure 3: Spectral norm loss (left panel) and Frobenius norm loss (right panel) when there 


is a gap between a r (A ) and a r+ i(A). The singular value values of A are given by (21), 
Pi = P2 = 1000, and mi = m 2 = 50. 


We now turn to the settings with the singular values being {j~ a , j = 1,2,..., min(pi,p 2 )} 
and various choices of a, p\ and p 2 . Hence, no significant gap between adjacent singular values 
exists under these settings and we aim to demonstrate that our method continues to work well. 
We first consider pi = p 2 = 1000, rri\ = m 2 = 50 and let a range from 0.3 to 2. Under this 
setting, we also study how the choice of thresholds affect the performance of our algorithm. 
For simplicity, we report results only for row thresholding as results for column thresholding 
are similar. The average loss of A 22 from Algorithm 2 with T ri £ {c^mi/pi, c £ [1,6]} 
under both the spectral norm and Frobenius norm are given in Figure [4} In general, the 
algorithm performs well provided that a is not too small and as expected, the average loss 
decreases with a higher decay rate in the singular values. This indicates that the existence of 
a significant gap between adjacent singular values is not necessary in practice, provided that 
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the singular values decay sufficiently fast. When comparing the results across different choices 
of the threshold, c = 2 as suggested in our theoretical analysis is indeed the optimal choice. 
Thus, in all subsequent numerical analysis, we fix c = 2. 





Figure 4: Spectral norm loss (left panel) and Frobenius norm loss (right panel) as the thresh¬ 
olding constant c varies. The singular values of A are {j~ a ,j = 1,2,...} with a varying from 


0.3 to 2, pi = P 2 = 1000, and mi = m 2 = 50. 


To investigate the impact of row versus column thresholding, we let the singular value decay 
rate be a = 1, pi = 300,po = 3000, and mj and m 2 varying from 10 to 150. The original 
matrix A is generated the same way as before. We apply row and column thresholding with 
Tn = 2^/pi/mi and T c = 2^p2/m 2 . It can be seen from Figure [H] that when the observed 
rows and columns are selected randomly, the results are not sensitive to the choice between 
row and column thresholding. 

We next turn to the comparison between our proposed SMC algorithm and the penalized 
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average spectral norm loss average spectral norm loss 



(a) Spectral norm loss; column thresholding 


(b) Frobenius norm loss; column thresholding 


(c) Spectral norm loss; row thresholding (d) Frobenius norm loss; row thresholding 

Figure 5: Spectral and Frobenius norm losses with column/row thresholding. The singular 
values of A are {j 1 ,j = 1,2,...}, pi = 300, P 2 = 3000, and mi, m -2 = 10, ...,150. 
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NNM method which recovers A by Q. The solution to Q can be solved by the spectral 


regularization algorithm by Mazumder et al. (2010) or the accelerated proximal gradient 


algorithm by Toll and Yun (2010), where these two methods provide similar results. We use 
5-fold cross-validation to select the tuning parameter t. Details on the implementation can be 
found in the Supplement. 

We consider the setting where pi = p 2 = 500, mi = m 2 = 50,100 and the singular 
value decay rate a ranges from 0.6 to 2. As shown in Figure [6j the proposed SMC method 
substantially outperform the penalized NNM method with respect to both the spectral and 
Frobenius norm loss, especially as a increases. 




(a) Spectral norm loss (b) Frobenious norm loss 

Figure 6: Comparison of the proposed SMC method with the NNM method with 5-cross- 
validation for the settings with singular values of A being {j~ a ,j = 1,2,...} for a ranging 
from 0.6 to 2, pi = P 2 = 500, and mi = m 2 = 50 or 100. 


Finally, we consider a simulation setting that mimics the ovarian cancer data application 
considered in the next section, where pi = 1148, p 2 = 1225, mj = 230, m 2 = 426 and the 
singular values of A decay at a polynomial rate a. Although the singular values of the full 
matrix are unknown, we estimate the decay rate based on the singular values of the fully 
observed 552 rows of the matrix from the TCGA study, denoted by {crj,j = 1, ...,522}. A 
simple linear regression of {log(<x,), j = 1, ...,522} on (log(j),j = 1, ...,522} estimates a as 
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0.8777. In the simulation, we randomly generate A e M piXp2 such that the singular values are 
fixed as {j~' 8777 ,j = 1,2,---}. For comparison, we also obtained results for a = 1 as well 
as those based on the penalized NNM method with 5-cross-validation. As shown in Table [lj 
the relative spectral norm loss and relative Frobenius norm loss of the proposed method are 
reasonably small and substantially smaller than those from the penalized NNM method. 


Relative spectral norm loss 

Relative Frobenius norm loss 

SMC 

NNM 

SMC 

NNM 

a = 0.8777 0.1253 

0.4614 

0.2879 

0.6122 

a = 1 0.0732 

0.4543 

0.1794 

0.5671 


Table 1: Relative spectral norm loss (||A 22 — ^4 2 21|/1|^4 2 21|) and Frobenius norm loss (||A 22 — 
A 22 1| p/1|-^ 22 1|f) f° r Pi = 1148, p 2 = 1225, mi = 230, m 2 = 426 and singular values of A being 

{j~ a -3 = 1,2, 


5 Application in Genomic Data Integration 

In this section, we apply our proposed procedures to integrate multiple genomic studies of 
ovarian cancer (OC). OC is the fifth leading cause of cancer mortality among women, at¬ 


tributing to 14,000 deaths annually (Siegel et al. 2013). OC is a relatively heterogeneous 


disease with 5-year survival rate varying substantially among different subgroups. The overall 
5-year survival rate is near 90% for stage I cancer. But the majority of the OC patients are 
diagnosed as stage III/IV diseases and tend to develop resistance to chemotherapy, resulting 
a 5-year survival rate only about 30% (Holschneider and Berek, 2000). On the other hand, a 
small minority of advanced cancers are sensitive to chemotherapy and do not replapse after 
treatment completion. Such a heterogeneity in disease progression is likely to be in part at¬ 


tributable to variations in underlying biological characteristics of OC (Berchuck et al. 2005) 
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This heterogeneity and the lack of successful treatment strategies motivated multiple genomic 
studies of OC to identify molecular signatures that can distinguish OC subtypes, and in turn 
help to optimize and personalize treatment. For example, the Cancer Genome Atlas (TCGA) 
comprehensively measured genomic and epigenetic abnormalities on high grade OC samples 


(Cancer Genome Atlas Research Network, 2011). A gene expression risk score based on 193 
genes, Q , was trained on 230 training samples, denoted by TCGA^, and shown as highly 
predictive of OC survival when validated on the TCGA independent validation set of size 322, 
denoted by TCGA^, as well as on several independent OC gene expression studies including 


those from Bonome et al. (2005) (BONO), Dressman et al. (2007) (DRES) and Tothill et al. 


(2008) (TOTH). 

The TCGA study also showed that clustering of miRNA levels overlaps with gene-expression 
based clusters and is predictive of survival. It would be interesting to examine whether com¬ 
bining miRNA with Q could improve survival prediction when compared to Q alone. One 
may use TCGA^ to evaluate the added value of miRNA. However, TCGA^ is of limited 
sample size. Furthermore, since miRNA was only measured for the TCGA study, its utility 
in prediction cannot be directly validated using these independent studies. Here, we apply 
our proposed SMC method to impute the missing miRNA values and subsequently construct 
prediction rules based on both Q and the imputed miRNA, denoted by miRNA, for these 
independent validation sets. To facilitate the comparison with the analysis based on TCGA^' 
alone where miRNA measurements are observed, we only used the miRNA from TCGA^ for 
imputation and reserved the miRNA data from TCGA^ for validation purposes. To improve 
the imputation, we also included additional 300 genes that were previously used in a prog¬ 


nostic gene expression signature for predicting ovarian cancer survival (Denkert et al., 2009). 
This results in a total of m\ = 426 unique gene expression variables available for imputation. 
Detailed information on the data used for imputation is shown in Figure [7J Prior to impu¬ 
tation, all gene expression and miRNA levels are log transformed and centered to have mean 
zero within each study to remove potential platform or batch effects. Since the observable 
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rows (indexing subjects) can be viewed as random whereas the observable columns (indexing 
genes and miRNAs) are not random, we used row thresholding with threshold T fi = 2y/pi/rrii 
as suggested in the theoretical and simulation results. For comparison, we also imputed data 
using the penalized NNM method with tuning parameter t selected via 5-fold cross-validation. 


Figure 7: Imputation scheme for integrating multiple OC genomic studies. 

m 2 =426 p 2 -m 2 =799 

TCGA Training Set (n=230) 

TCGA Test Set (n=322) 

Tothill Study (n=285) 

Dressman Study (n=117) 

Bonome Study (n=195) 




Gene Expression Markers 

miRNA Expression Markers 

Gene Expression Markers 

? 

Gene Expression Markers 


Gene Expression Markers 

? 

Gene Expression Markers 

? 


We first compared rniRNA to the observed miRNA on TCGA^. Our imputation yielded a 
rank 2 matrix for miRNA and the correlations between the two right and left singular vectors 
miRNA to that of the observed miRNA variables are .90, .71, .34, .14, substantially higher 
than that of those from the NNM method, with the corresponding values 0.45, 0.06, 0.10, 0.05. 
This suggests that the SMC imputation does a good job in recovering the leading projections 
of the miRNA measurements and outperforms the NNM method. 

To evaluate the utility of miRNA for predicting OC survival, we used the TCGA*-*'’ to se¬ 
lect 117 miRNA markers that are marginally associated with survival with a nominal p -value 
threshold of .05. We use the two leading principal components (PCs) of the 117 miRNA mark¬ 
ers, miRNA pc = (miRNA PC , rniRNA^ 0 ) 7 , as predictors for the survival outcome in addition 
to Q. The imputation enables us to integrate information from 4 studies including TCGA^, 
which could substantially improve efficiency and prediction performance. We first assessed 
the association between {miRNA PC ,ty} and OC survival by fitting a stratified Cox model 
(Kalbfleisch and Prentice, 201lj) to the integrated data that combines TCGA^ and the three 
additional studies via either the SMC or NNM methods. In addition, we fit the Cox model 
to (i) TCGA^ set alone with miRNA PC obtained from the observed miRNA; and (ii) each 
individual study separately with imputed miRNA PC . As shown in Table [2j)a) , the log hazard 
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ratio (logHR) estimates for miRNA pc from the integrated analysis, based on both SMC and 
NNM methods, are similar in magnitude to those obtained based on the observed miRNA 
values with TCGAW However, the integrated analysis has substantially smaller standard er¬ 
ror (SE) estimates due the increased sample sizes. The estimated logHRs are also reasonably 
consistent across studies when separate models were fit to individual studies. 

We also compared the prediction performance of the model based on Q alone to the model 
that includes both Q and the imputed miRNA PC . Combining information from all 4 studies 
via standard meta analysis, the average improvement in C-statistic was 0.032 (SE = 0.013) for 
the SMC method and 0.001 (SE = 0.009) for the NNM method, suggesting that the imputed 
miRNA PC from the SMC method has much higher predictive value compared to those obtained 
from the NNM method. 

In summary, the results shown above suggest that our SMC procedure accurately recovers 
the leading PCs of the miRNA variables. In addition, adding miRNA PC obtained from imputa¬ 
tion using the proposed SMC method could significantly improve the prediction performance, 
which confirms the value of our method for integrative genomic analysis. When comparing to 
the NNM method, the proposed SMC method produces summaries of miRNA that is more 
correlated with the truth and yields leading PCs that are more predictive of OC survival. 
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Table 2: Shown in (a) are the estimates of the log hazard ratio (logHR) along with their 
corresponding standard errors (SE) and p-values by fitting stratified Cox model integrating 
information from 4 independent studies with imputed miRNA based on the SMC method and 
the nuclear norm minimization (NNM); and Cox model to the TCGA test data with original 
observed miRNA (Ori.). Shown also are the estimates for each individual studies by fitting 
separate Cox models with imputed miRNA. 

(a) Integrated Analysis with Imputed miRNA vs Single study with observed miRNA 



logHR 

Ori. SMC NNM 

SE 

Ori. SMC NNM 

p-value 

Ori. SMC NNM 

£ 

miRNA ^ 

miRNAf 

.067 .143 .168 

-.012 -.019 -.013 

.023 .018 -.005 

.041 .034 .028 

.009 .006 .012 

.014 .009 .014 

.104 .000 .000 

.218 .001 .283 

.092 .039 .725 


(b) Estimates for Individual Studies with Imputed miRNA from the SMC method 




logHR 



SE 




p-value 



TCGA 

TOTH 

DRES 

BONO 

TCGA 

TOTH 

DRES 

BONO 

TCGA 

TOTH 

DRES 

BONO 

£ 

.051 

.377 

.174 

.311 

.048 

.069 

.132 

.117 

.286 

.000 

.187 

.008 

miRNA[ c 

-.014 

-.021 

-.031 

-.010 

.011 

.012 

.014 

.014 

.207 

.082 

.030 

.484 

miRNA!,' 0 

.014 

.045 

-.021 

.036 

.016 

.018 

.022 

.019 

.391 

.009 

.336 

.054 


(c) Estimates for Individual Studies with Imputed miRNA from the NNM method 




logHR 



SE 




p -value 



TCGA 

TOTH 

DRES 

BONO 

TCGA 

TOTH 

DRES 

BONO 

TCGA 

TOTH 

DRES 

BONO 

G 

.082 

.405 

.361 

.258 

.037 

.066 

.114 

.088 

.028 

.000 

.002 

.003 

miRNAf 

-.045 

.016 

.055 

-.008 

.021 

.026 

.031 

.023 

.034 

.544 

.076 

.721 

miRNA!; 0 

.008 

-.086 

-.043 

.019 

.026 

.027 

.034 

.029 

.758 

.002 

.201 

.496 



6 Discussions 


The present paper introduced a new framework of SMC where a subset of the rows and 
columns of an approximately low-rank matrix are observed. We proposed an SMC method 
for the recovery of the whole matrix with theoretical guarantees. The proposed procedure 
significantly outperforms the conventional NNM method for matrix completion, which does 
not take into account the special structure of the observations. As shown by our theoretical 
and numerical analyses, the widely adopted NNM methods for matrix completion are not 
suitable for the SMC setting. These NNM methods perform particularly poorly when a small 
number of rows and columns are observed. 

The key assumption in matrix completion is the matrix being approximately low rank. 
This is reasonable in the ovarian cancer application since as indicated in the results from 
the TCGA study (Cancer Genome Atlas Research Network, 2011), the patterns observed in 
the rniRNA signature are highly correlated with the patterns observed in the gene expression 
signature. This suggests the high correlation among the selected gene expression and miRNA 
variables. Results from the imputation based on the approximate low rank assumption given 
in Section [5] are also encouraging with promising correlations with true signals and good 
prediction performance from the imputed miRNA signatures. We expect that this imputation 
method will also work well in genotyping and sequencing applications, particularly for regions 
with reasonably high linkage disequilibrium. 

Another main assumption that is needed in the theoretical analysis is that there is a 
significant gap between the r th and (r + l) th singular values of A. This assumption may not 
be valid in real practice. In particular, the singular values of the ovarian dataset analyzed in 
Section [5] is decreasing smoothly without a significant gap. However, it has been shown in the 
simulation studies presented in Section [4] that, although there is no significant gap between 
any adjacent singular values of the matrix to be recovered, the proposed SMC method works 
well as long as the singular values decay sufficiently fast. Theoretical analysis for the proposed 
SMC method under more general patterns of singular value decay warrants future research. 
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To implement the proposed Algorithm 2, major decisions include the choice of threshold 
values and choosing between column thresholding and row thresholding. Based on both the¬ 
oretical and numerical studies, optimal threshold values can be set as Tc = 2 P 2 /‘m -2 for 
column thresholding and Tr — 2y / /y /rti\ for row thresholding. Simulation results in Section [d] 
show that when both rows and columns are randomly chosen, the results are very similar. In 
the real data applications, the choice between row thresholding and column thresholding de¬ 
pends on whether the rows or columns are more “homogeneous”, or closer to being randomly 
sampled. For example, in the ovarian cancer dataset analyzed in Section [5j the rows corre¬ 
spond to the patients and the columns correspond to the gene expression levels and rniRNA 
levels. Thus the rows are closer to random sample than the columns, consequently it is more 
natural to use the row thresholding in this case. 

We have shown both theoretically and numerically in Sections [3] and [4] that Algorithm 
[2] provides a good recovery of A 22 . However, the naive implementation of this algorithm 
requires min(mi,m 2 ) matrix inversions and multiplication operations in the for loop that 
calculates ||.Dr )S || (or ||T ) c' iS ||), s G {f,f + 1, • • • , min(mi, m 2 )}. Taking into account the 
relationship among D Rs (or D C)S ) for different s’s, it is possible to simultaneously calculate 
all || D r s || (or HDcJ) and accelerate the computations. For reasons of space, we leave optimal 
implementation of Algorithm 2 as future work. 
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Supplement to “Structured Matrix Completion With 
Applications to Genomic Data Integration” f] 

Tianxi Cai, T. Tony Cai and Anrn Zhang 

Abstract 

In this supplement we provide additional simulation results and the proofs of the main 
theorems. Some key technical tools used in the proofs of the main results are also developed 
and proved. 


1 Additional Simulation Results 

We consider the effect of the number of the observed rows and columns on the estimation 
accuracy. We let pi = p 2 = 1000, let the singular values of A be = 1,2,...} and let 

mi and m 2 vary from 10 to 210. The singular spaces U and V are again generated randomly 
from the Haar measure. The estimation errors of A 22 from Algorithm 2 with row thresholding 
and T r{ — 2y / p 1 /m 1 over different choices of mi and m 2 are shown in Figure [lj As expected, 
the average loss decreases as mi or m 2 grows. Another interesting fact is that the average 
loss is approximately symmetric with respect to mi and m 2 . This implies that even with 
different numbers of observed rows and columns, Algorithm 2 has similar performance with 
row thresholding or column thresholding. 

1 Tianxi Cai is Professor of Biostatistics, Department of Biostatistics, Harvard School of Public Health, 
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Cai and Anru Zhang was supported in part by NSF Grant DMS-1208982 and NIH Grant R01 CA127334. 
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(a) Spectral norm loss (b) Frobenious norm loss 

Figure 8: Losses for the settings with singular values of A being {j -1 , j = 1, 2, pi = p 2 = 
1000, mi, m 2 = 10,..., 210. 

We are also interested in the performance of Algorithm 2 as pi and the ratio rrii/pi vary. 
To this end, we consider the setting where p 2 = 1000, m 2 = 50, and the singular values of A 
are chosen as {j -1 , j = 1,2,...}. The results are shown in Figure [ 9 } It can be seen that when 
mi/pi increases, the recovery is generally more accurate; when m\jp\ is kept as a constant, 
the average loss does decrease but not converge to zero as pi increases. 

2 Technical Tools 

We collect important technical tools in this section. The first lemma is about the inequalities 
of singular values in the perturbed matrix. 

Lemma 1 Suppose X G M pxn , Y e W xn , rank(X) = a, rank(Y ) = b, 

1. <j a+h+l _ r {X + Y) < min(cr a+ i_ r (X), cr b+1 _ r (F)) for r > 1; 

2. if we further have XAY = 0, we must have a + b < n, oy( X + Y) > max(oy(X), cr r (Y)) for 
r > 1. 
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Figure 9: Losses for settings with singular values of A being {j 1 , j = 1,2,3...}, p 2 = 1000, 
m 2 = 50, mi/pi = 1/4,1/12,1/20,1/28,1/36, and p x = 100,..., 100,000. 


Lemma 2 Suppose X e R pxn , Y e R nxm are two arbitrary matrices, denote || ■ || g , || • || as 
the Schatten-q norm and spectral norm respectively, then we have 


l|vr||„< PVimi- (22) 

The following two lemmas provide examples that illustrate NNM fails to recover A 22 . 

Lemma 3 Assume A = B\Blf, where B i e M PlXr and B 2 e M p2Xr are two i.i.d. standard 
Gaussian matrices. Let A is divided into blocks as ((Tj) . Suppose 

r < |^min {pi,p 2 ), rn x < ^p u m 2 < ^ p 2 , (23) 

then the NNM (|3]) fails to recover A 22 with probability at least 1 — 12exp(— min(pi,p 2 )/400). 

Lemma 4 Denote l p as the p-dimensional vector with all entries 1. Suppose A = l Pl • l J p2 , 
and A is divided into blocks as Q. Then the NNM (J3]) yields 

The following result is on the norm of a random submatrix of a given orthonormal matrix. 
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Lemma 5 Suppose U € M. pxd is a fixed matrix with orthonormal columns (hence d < p). 
Denote W = maxi<j< p ^ • ^2j=i u %- Suppose we uniform randomly draw n rows (with or 
without replacement) from U and note the index as and denote 


Un = 


U< 


n(i) 


U< 


Q(n) 


When n > 4U ^i°^ 2 +c) for some 0 < a < 1 and c > 1, we have 


tan 


kmin(£/n)|| > \ — 
P 


with probability 1 — 2e 


The following results is about the spectral norm of the submatrix of a random orthonormal 
matrix. 


Lemma 6 Suppose U € M. pxd (d < p) is with random orthonormal columns with Haar mea¬ 
sure. For all 0 < ot\ < 1 < a^, there exists constant C,S > 0 depending only on oi,a 2 such 
that when p > n > min {Cd,p}, we have 

< WtW]) £ II^Wlll < (24) 

with probability at least 1 — exp(— Sn). 


Proof of the Technical Lemmas 


Proof of Lemma [T] 

1. First, by a well-known fact about best low-rank approximation, 

a a+b+ i- r {X + Y) = min \\X + Y - 

MGl pXn ,r an k(M)<fl+6-r 

ffence, 

&a+b+ i-r(X + Y) < ||A + Y — (A 7 " max ( a _ r ) + T)|| = ||A"_ max ( a _ r )|| = cr a+ i_ r (A"); 
similarly a a+b+l _ r (X + Y) < a b+ 1 _ r (Y). 
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2. When we further have X J Y = 0, we know the column space of X and Y are orthogonal, 
then we have rank(X + Y) = rank(A) + rank(P) = a + 5, which means a + b < n. Next, 
note that 


(X + Y) J (X + Y)= X T X + Y T Y + X T Y + Y T X = X T X + Y T Y, 


if we note A r (-) as the r-tli largest eigenvalue of the matrix, then we have 

a 2 (X + Y) =X r ((X + Yy{X + y)) = X r (X J X + Y T Y) 

>max(A r (X T X),A r (Y J Y)) = ma x(a 2 r (X),a 2 r {Y)). 

□ 

Proof of Lemma [2l Since 


II -XY ||, = W || A'||„ = W a f(X), 

it suffices to show ai(XY) < cr i (A’)||y||. To this end, we have 

<n(X) = min \\XY — M\\ < \\XY - X max(i _ 1 } Y || = ||^T_ ina 3 c(< _ 1 ) l^|| < *i(X)\\Y\\, 

MeRP Xm ,rank(M)<i-l 

which finishes the proof of this lemma. □ 

Proof of Lemma [3j Since B i and B 2 and their submatrices are all i.i.d. standard matrices, 


by the random matrix theory (Corollary 5.35 in Vershynin (2010)), for t > 0, we have with 
probability at least 1 — 12 exp(—t 2 /2), the following inequalities hold, 


A r (A) >X min (Bi)X min (B 2 ) > (VpT - a /r - t)(y/p 2 - y/r - t) 

@ /19 A /19 

- 20 ^“' 20 ^"* 


(25) 


f /l A /2l A 

( ^VPi + t ) ( 2qV^2 + M (26) 


and 


11^-211| —||-Bl,[(mi+l):pi,:]-S2,[l:m 2 ,:]ll - (\/Pl + + t)(y/m^ + y/r + t) 




(27) 
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Denote 


and set t 


A 0 — 

^min (y/p{,y/pi). Since ||v4 0 |U 


-*411 dLl2 
A21 0 

< P 1# ||* + 11^-21 II*) 


we have 


\ 

IP||* > J > 1 - 12exp(— min(pi,p 2 )/400) (28) 

and 

/ 264 _\ 

p (Poll* < ^VPiP 2 J > 1 - 12exp(— p 2 ) / 400). (29) 

Hence, with probability at least 1 — 12exp(— min(pi,p 2 )/400), ||H 0 ||* < ||H||*, which implies 
that the NNM (|3]) fails to recover H 22 . □ 


Proof of Lemma [ 4 ], For convenience, we denote x A y = min(x, y) for any two real numbers 
x,y. First, we can extend the unit vectors ^=l mi , ^==lma, ^=!pi-mi and ^=l P2 _ m2 
into orthogonal matrices, which we denote as U mi G JJ m2 G M m 2X1712 , f/ pi _ mi G 

R (pi-mi)x( P1 - mi ) ; U p2 - m2 G M(P2-m 2 )x(p 2 -m 2 )_ Nexfcj for all ^ g R (pi-mi)X(p 2 -m 2 ), wg mugt 


have 


A\\ A 


12 


A 2 1 An 


22 


U' mi 


0 

En 


U J 

^ pi—rni 



An 

l 

CN 



^21 

1 

CN 
^ CN 



u, 


m 2 


0 Up 2 —m 2 


E 


12 


see 


E'il Epx —mi A -22 Ep 2 m 2 

where En G M miXm2 ,i?i 2 G W miX ( p2 ~ m2 \ E 2 i G M(P 1-mi ) xm 2 are with the first entry y/mim 2 , 
\/mi(p 2 — m 2 ) and \Jm 2 (j> 1 — mj] respectively and other entries 0. Therefore, we can 

En E 12 y/‘mim 2 \Jmi{p 2 - m 2 ) 

E2l —mi y 4 22 D p2 _ m2 J || || L\/' m 2(Pi - mi) [U J pi —mi A 22 U P2 —m 2 ] [1 ;1] 

and the equality holds if and only if U p] _ ni] A', n U p2 _ m2 is zero except the first entry. 

By some calculation, we can see the nuclear norm of 2-by-2 matrix 

y/mim 2 \Jmi (p 2 ~ m 2 ) 

\/m 2 (pi - mi) x 
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achieves its minimum if and only if 


x 


= y/mim 2 A \J(pi - mi)(p 2 - m 2 ). 


Hence, A' 22 achieves the minimum of 


An An 
A21 A'22 


if and only if 


77T A’ TT 

u p 1 — m 1 22 P2—m 2 


which means the minimizer A' 22 = 


yj‘m 1 m 2 A ^/(pi - m!)(p 2 - m 2 ) 0 
0 0 


mi m 2 


(p\—m\)(p2—iri2) ^ ^ J m i"^P2—1712" ^ 


Proof of Lemma [5| The proof of this lemma relies on operator-Bernstein’s inequality for 


sampling (Theorem 1 in Gross and Nesrne (2010)). For two symmetric matrices A, B , we 


say A A B if B — A is positive definite. By assumption, {t/n(j)., j — 1, ■ ■ ■ ,n} are uniformly 
random samples (with or without replacement) from {[/*., i = !,••• ,n}. Suppose 


I dl i 1, ■ ■ ■ ,P, 
P 


(30) 


then X t are symmetric matrices, Xq^\ , j = 1, • • • ,n are uniformly random samples (with or 
without replacement) from {Xi, ■ ■ ■ ,X p }. In addition, we have 

1 P 11 1 

EXj = -Tuj,u t . - -I d = -IPU--U = 0 

n z—/ n n n 


P * 


i —1 


P p 


p 


Ill'll < max 

1<2<P 


Uj.U i9 . - -I d 
P 


< max max { \\Uj,U im \\ , - \\I d \\ \ < 

1 <i<P 1 p J p 


i =1 
P 


EX) A f) ( uj'Ui. - M =i£( o’o.oio. - lulu* + X 


P 


P 


i= 1 


p 


p* 


Y_ 110.ni ■ E/7.0. - -p 


i =1 


P z 


1 ITd 

A- ■ - 

p p 


E a.o. 


i= 1 


—^Id. A 

pZ 


Wd-1 


pZ 
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For all 0 < a < 1, by Theorem 1 in Gross and Nesme (2010), 


p(H < ^ j = p (uiv „ i = p ^ s fi. 


p[±x^Jl^i d] <p 


0 = 1 


p 


< 2d exp — min 


E- Y i 

3 = 1 

/ ((1 — a)n/p ) 2 (1 — a)n/p 

yAn(Wd-l)/p 21 2Wd/p 


> 


(1 — a)n 

p 


The last inequality is due to the assumption that 

AWd(\ogd + c) 


n > 


(i-ay 


□ 


Proof of Lemma [fij By the assumption on n, we have n > p or n > Cd. When n > p. we 


know n — p and Uy\ m A = U is an orthogonal matrix, which means (24) is clearly true. Hence, 
we only need to prove the theorem under the assumption that p > n is true. In this case, we 
must have n > Cd. 

Since U has random orthonormal columns with Haar measure, for any fixed vector v G M d , 
Uv is identitical distributed as 


;r|| 2 1 (aq, aq, • • ■ , x p ), where aq, • ■ ■ ,x p rs-/ iV(0,l) 


Hence, Uy i :n , : ]V is identical distributed with ||x|| 2 x (aq, • • • ,x n ) and 


|| t/[i:n,:]H ||2 is identical distributed as 


\ i=1 


1=1 


which is the also the square root of Beta distribution. Denote 


(31) 


a-, = 


1 + ot\ 


On = 


1 + «2 


2 


2 


(32) 





















By Lemma 1 in Laurent and Massart (2000), when x±, ■ ■ ■ ,x p are i.i.d. standard normal, we 
have 

V n r 2 

1 - 2 VO < — i=1 1 < 1 + 2 VO + 2 C’ 
n 


/ On Y] p - , x\ / On 20n 

V P P y p p 

both hold with probability at least 1 — 4 exp (—C'n). Here we let O > 0 be small enough and 
only depending on a 1; a 2 such that 

1-2 VO 1 + 2^0 + 20 


a\ < 


1 + 2^0 + 20’ 1-2 VO 




Combining the previous inequalities and (31), we have for any fixed unit vector v G 


a\n 


aLn 


l ~ < \\U {l .. nA v\\l < 

p p 


(33) 


with probability at least 1 — 4exp(— On), where O only depends on a \, a' 2 . Next, based on 
Lemma 2.5 in Vershynin (2013), we can construct an e-net on the unit sphere of as B, 


such that \B\ < (1 + 2/e) d , where e > 0 is to be determined later. Under the event that 


{Vu G B, (33) holds}, we suppose 


Ki = min ||H 2 , = max \\U[ 1:nA v\\l. 

||"Ll||2 = l l + ll 2 = 1 

For any v in the unit sphere of M d , there must exists v' G B such that ||u — u'|| 2 < e, which 
yields, 

||^[l:n,:]V||2 < || U[ 1:nA v'\\ 2 + \\U [1:nA (v - v')\\ 2 < A/ a' 2 n/p + K 2 E 

\\U[l:n,:]V\\ 2 > \\U[ 1:n! :]v'\\2 ~ \\U[i :nA (v ~ u')|| 2 > \Ja[n/p - EK 2 

These implies that k 2 < yja' 2 7i/p /(1 — e), K\ > \Jot'pn/p — s k 2 > yja[n/p — \Ja^n/p-e /(1 — e). 
Hence, we can take e depending on a i,a 2 such that k 2 < \Ja 2 n/p, K\ > ctin/p, which 


implies (24). 


Finally we estimate the probability that the event {Vu G B , ( |33| ) holds} happens. We 
choose C > 4dlog(l + 2/e)/0 that only depends on a\ and a 2 . If n > Cd, 


On/2 > dlog(l + 2/e) + log4. 
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so 


1 - (1 + 2/e) d • 4exp (-C'n) = 1 - exp(dlog(l + 2/e) + log4 - C'n) > 1 - exp(-raC'/2) 
Finally, we finish the proof of the lemma by setting S = C'/ 2. □ 


3 Proofs of the Results in the Main Paper 

We prove Proposition [lj Theorems [l] and [2j Lemma [Tj Lemma |8j Theorem [3j Corollary [T| and 
Corollary [2] in this section. 

Proof of Proposition [T] 

Since A 1# is of rank r, which is the same as A, all rows of A must be linear combinations of the 
rows of A u . This implies all rows of A,\ is a linear combination of An. Since rank(A.i) = r, 
we must have rank(An) > r. Besides, rank(A 11 ) < rank(A) = r since An is a submatrix of 
A. So rank(An) = r. Simiarly, rows of A 9l is the linear combination of An, so we have 

A 2 1 = A 21 P All = A 2 iA\ l (AnA\ i yAn = A 21 VEU^(UY?U^A n = {A 21 V E" 1 ^) A n , 

namely rows of A 21 is a linear combination of A n - By the argument before, we know A 22 
can be represented as the same linear combination of Ai 2 as A 2 i by An, so we have A 22 = 
(A 2 if/£ -1 £/ T ) A \ 2 = A^VYi^U 1 A \ 2 = A 2 \A} X 1 A\ 2 , which concludes the proof. □ 


Proof of Theorem |T| 


Suppose M 6 W nilXr ,N e M m2Xr are column orthonorma li zed matrices of Un and Vn- M G 
R miXr and N e l m2Xr are the first r left singular vectors of A\, and A,n respectively. Also, 
recall that we use Pu = U(U J UYU J to represent the projection onto the column space of U. 

1. We first give the lower bound for a mi n (M T M), a m i n (N J N ) by the unilateral perturbation 


bound result in Cai and Zhang (2014). Since, 


P Ull A 1 . = P Uu Ui.VSn = [Un^i, Pu^Uu^yVi, P uA A u = P u ±Ui,T,V J = [ 0 , P uh U 12 L 2 ]V\ 
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by V is an orthogonal matrix, we can see 


Pri-Pun-Au) — cr r ([C/nSi -P[/iih r i 2 S 2 ]) > <T r (U\\ Si) > cr r (A)cr m i n (C/n), 


ll-Pc/ i L i^ 1 »ll — < II Pu^Ui2 1|||E 2 1| < <r r +i(A). 

So a^Pu^Ai,) > ||P t/ ±A 1 .||. Besides, rank(P[/ 11 ^4 1 .) <r. Apply the unilateral perturbation 


bound result in 


Cai and Zhang (2014) by setting X = Pu^Au, Y = Pjj± Ai., we have 


vlJMTM) < 1 - 


\Y-P X i\\-o r+1 (A) 


(34) 


Xr( A )X mi J,U U ) - ff?+l(-4). 

Moreover, A±, = [Un Pi 2 ]diag(Si, £ 2 )H T = [PuSi £/i 2 £ 2 ]H t , and hence, 

H'PPxtII = Pu^Au ■ P(p UiiAlm )i = [0 Pjj^Uu^V 1 ■ Pv-[UiiSi Pu u Ui2 s 2 h 

[0 • P\U \iSi Py n c/i 2 E 2 ]T = sup [0 P^xf/^Ss] • P[c/ llSl p^u^]^- 

a:eK p 2 ,||a;|| 2 =l 

When ||x|| 2 = 1, let y denote the projection of x onto the column space of [PnEi P;j 11 Pi 2 E 2 ] t . 
Then ||?/|| 2 < 1 and y is in the column space of [PnEi Pc/uP^E^T Hence, 
h[i :r7T,i] 112 ^ 0"min(PllEi) ^ ^min (Uu)<r r (A) 


> 


and ||?/[(mi+l):pi] || 2 T ||2/[l:mi]|| 2 ^ 1) 


ll?/[(mi+l):pi] 112 _ HPP11P12E2H _ a r+ 1 (A) 

which implies ||?/[( mi+ i):p 1 ]|| 2 < g 2 + 1 (A)/aP n (U n )a 2 (A) + cr r 2 + 1 (A). Hence for all x G W 2 
such that ||x|| 2 = 1, 

1° Pu^Pl^l ■ -P[C/nSi Pc n !7i2S2]T^ <||P [7 xPi 2 E 2 ||- 11^+1:^112 

_ fly+ilA) _ 

~~ T+1 V a r+l( A ) + 

This yields ||HPxt|| = ||PpxAi. • P(p UllAl .)\\ < cr; +1 {A) / yja 2 r+l {A) + a'P^U^a^A). Com¬ 


bining (34), we have 


a 2 mm (XPM) >1 


o*+M) 


^/a 2 r+l (A) + c jA n (P n )a2(A) (^(AjcrP^Pn) 
Since c r m i n (Pn)cr r .(A) > 2<j r+1 (A), we have 


cr, 


r+1 


(41)) 


. ( 35 ) 


<in (M'M) > 1 - 

Similarly, we also have cr P n (N J N) > 

11 


> 44 

a/5 • 3 ) ~ 45' 
































2. Following by (j8]), 

A 22 = U 2m EV£N ^M T (U 1 ,EV 1 T ,)Jv'j~ 1 M T U 1 ,EVj, 

= (u^VZN + U 22 Z 2 V1 2 N^) (APUnE^N + liPU l2 E 2 V^Ny l (pPU^Vl + PPU 12 Z 2 V 2 J 2 
Let “L”, “M”, “R” stand for “Left”, “Middle” and “Right”, 


X = 

E l = B 22 E 2 R 1 ! 2 iV; 

(36) 

= WTh&^N, 

B m = M j Ui 2 E 2 Vi 2 N', 

(37) 

= WUnXiK, 

Br = M T B 12 E 2 R 2 T 2 . 

(38) 


By Lemma [2] in the Supplement, we can see the following properties of these matrices, 


\E l \\ < P+i{A), \\E m \\ < cr r+ i(yl), \\E r \\ < a r+ i(A)i 


(39) 


\E L \\q < ||S 2 ||g, 11 Em 11 q < ||S 2 || g , \\E R \\ q < ||S 2 || g , 


(40) 


(T min (Bm) = a min (M''{P M Un)X 1 (V? 1 P N )N) = cr min [(M^M)(MT[/ 11 )E 1 (R 1 t 1 7V)(ATT7V) 

44 

>cr mi n(Ili)cr m i n (L r ii)cr m i n (4ii)cr m i n (M T M)cr m i n (A^ T A^) > — cr r (M)cr mill ([/ii)cr m i n (Vii), (41) 

45 


B M II = ^min(Sjf) < 


45 


44cr r (M) 

^min (Bn) ^min (Vii) ’ 
i 22 = (Br + El)(Bm + Em) 1 {B r + Br), BlBjJ B r = B 2 iEiV^\, 

|B L B^|| =||R 21 S 1 (R 1 \iV)(V r 1 T 1 A r )” 1 E _1 (M T f/ 11 ) _1 || = ||R 2 i(M T Rn)- 1 || 

a/45/44 


<||(M T MM T f/n)- 1 || < 


< 


B^Br|| = ||(RniV)- 1 R 2 T 1 || < 


(J m i n ( M T C/n ) <7 min ( M 1 iff) CLnin ( Bi 1) 

-y/45/44 


^"min(f / ll) 


By (39), (41) and the assumption (10), we can see cr min (B M ) > ||B M ||, so 


(42) 

(43) 

(44) 

(45) 


43 


v4 22 — (B l + E l )(B m — B m E m B m + B m E m B m E m B m - )(Br + Br); 
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oo 


oo 


11^22 — B l BjJB R \\ q < || B l BjJE m YX-BJE M ) l B^ B R \\ q + \\E l YX~Bm Bm)‘BjJB 

i =0 z=0 

OO OO 

+ II b lb-j J2 l -- E M B M 1 ) ,E 4 q + \\ E ^ B 4 J2 (- e - b 4) ,e 4, 

2=0 2=0 

OO oo 

iiiib«ii,E ii B "ii‘ii s M 1 iriiB«-BBii + iiBiii„E » B « n'ii B "ii‘ii B « B «ii 

2=0 2=0 

OO oo 

+ iisi-Bm ii E n^niiBiiii^ii. + ii^n E HBMir+iBMiiiiBHii, 


i =0 

m^\ \\B L Bl}\\\\Bl}B R \\ + WB^BrW + \\B L B~ A t\\ + ||1/-/1K + i/4) 

1 - a r+l (A)\\B^-\\ 

45/44 


i =0 

?-l D II l II D ... 


iisyi, 




< 


1 &r+l (A) || B m || \ CT m in(t^ll)o' m i n (^il) 0"min(Dll) C^min/ll) 

|| A_ max(r) || q ( 45/44 


45cr r . + l(A) 


44o> (A)(r min (Ui i) <r min (Vi i) 

45/44 


^/45/44 a/45/44 45 

0 "min(f^ll)c r min(^ / ll) 0 "min(f^ll) Cdnin (Vu) 8 " 


— /IQ l|A-max(r)|| 


a/45/44 a/45/44 45 


: 43 , r - —^n, Umin(c/ii)(Tmin(yii ) ' am . n (f/ n ) ' a min (yn) 

Finally, since A 22 = L^iSiV/i + U 22 E 2 V 22 ® B L B^B R + we have 

||i 22 - A 22 || 9 <||i 22 - + ||t^ 22 E 2 l/a||q 

1 \ / 1 


<3||A_ max(r )|| g fl + -- jyr^) + --TTrw') • 

\ C r min(h'll)/ \ CTmin(hll)/ 


□ 


Proof of Theorem [2] 

We only present proof for row thresholding as the column thresholding is essentially the same 
by working with A T . Suppose M,N are orthonormal basis of column vectors of Uu,\ n- 
We denote £///] = M, = N, which are exactly the same as the M and N in Algo¬ 

rithm 1. Similarly to the proof of Theorem [lj we have (35). Due to the assumption that 


a r (A)a min (D n )a min (Pii) > 4a r+1 (A), (35) yields 


a/ in (M T M) > 3824/3825, ///iWiV) > 3824/3825. 


(46) 


As shown in the Supplementary material, we have 
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Lemma 7 Under the assumption of Theorem [2[ we have f > r. 


We next show (13) with the condition that r > r in steps. 


1. Note that A n = C/n E i Vf + Ui 2 T, 2 Vf 2 , we consider the decompositions of Z and let 

Z n = + U {2 )j U 12 Z 2 V? 2 V {1 \ 

Zn,[l:r,l:r] = nE^vg^ + U^l } U 12 E 2 V? 2 V™ f] ± B M>f + 

Z 2U :, l: r\ = U 21 + U 22 Y, 2 V? 2 v£l f] 4 B L f + El?, 

+ f/g : l ] 17 12 E 2 l/ 2 T 2 4 

Note that the square matrix [/P] T r ]M G M rxr is a submatrix of [/| 2 ]ljM G M rxr , we know 


( 1 ) A R i Tp 

:.l:r] — %,f + ^M,f, 

( 47 ) 

i i J t f’ 1 i 2 T i r i 

( 48 ) 


( 49 ) 


^(^M) > a min (U^ r] M) = cr min (MM) ^ 


(2)1 


46J) / 3824 


3825 


(50) 


Similarly, cr min (V|^ 1 1 ) .IjA^) > ^/||||. By M,N are the orthonormal basis of column vectors of 
Un, Vn, we have Pm = MM J , P N = NN J , and 


(TrniniU^Un) P^U^l^a^MWn) > J^^a^Uu); ( 51 ) 


similarly, we also have 


ff».i„(^'Syn) > ym<r mln (y„) 


(52) 


(51) and (52) immediately yield 


3824 3824 

<Jr(B M ,r) Z gggr ^"min(C^ll)^min(El)(X m i n (Vii) = ( A)u m ; n (LW )^min(bll) • 

Besides, we also have 


^M,r|| ' || E 2 || C r - |-i (^4) 


(53) 


(54) 


2. Next, we consider the SVD of Zipp^nr] 


Z,|. I:; ,| :r = JAK\ J, A, K G W xf . 


(55) 
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For convenience, we denote Ai = A[ 1:r . jl:r ], A 2 = Ajp.+q.^p.+q.q, 

J\ J 2 '-^[:,(r+l);f] 7 A[. ^ :r j, E 2 K\.,(r+l)\r] i (56) 

Suppose Mz G W xr is an orthonormal basis of the column space of Bm,?] Nz E W xr is 
an orthonormal basis of the column space of B J M ~. Denote span(-) as the linear span of 
the column space of the matrix. We want to show span(M^) is close to span(Ji); while 
span(^) is close to span(Ab). So in the rest of this step, we try to establish bounds for 
a min (JjMz ) and a m - m {K}N z ). Actually, 


B M ,r A Em,? \Bm,? T PmzEm,?) A Pm t E M,f 


Now we set X = ( Bm,? + Pm z Em',?)■> Y = P M ±E M then we have 


f 3824 

^^c r r(A)cr m i n (C/ii)cr m i n (h ii) 0V+1 (A), 

f H 

vr+i(A) V \\e m 4 > ||y||. 

Besides, by the definition of Bm,? and Mz we know rank(X) < r. Also based on the 


dehnition of Y, we know PxY = 0. Now the unilateral perturbation bound in Cai and 


Zhang (2014) yields 


A-nin (MzJl) >1 


a r (X) ■ ||y|| 


^PO-IVIIV ■ (57) 

The right hand side of the inequality above is an increasing function of a r (X). Since 
<>r{X) > fi|a r (A)a min (f/ 11 )a min (D 11 ) - a r+1 (A) > (3 - ^)a r+1 (A) > (3 - *£=) ||Y||, 


aPn(JlMz) > 1 


3 - 4/3825 
(3 - 4/3825) 2 - 1 


> 0.859. 


Similarly, we also have 


<?min (K\N z ) > 0.859. 


(58) 


(59) 


3. We next derive useful expressions of A 22 and A 22 ■ First we introduce the following quantities, 
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Jj Zu : [i : f,i:?}Ki — JJBm,?Ei + J\Em,?K\ — Bmi + Emi, 


(60) 


15 










:f]/l2 *3* d 2 B M ,fK 2 + ./ 2 E M .fK 2 — Bm2 + Em2, 

(61) 

^21, [: 

,i:f } Ei ^ B Rf K x + 4 /i„ + E L1 , 

(62) 

^21, [: 

,i:f]K 2 ® B l/ K 2 + E L fK 2 4 5 L2 + E L2 , 

(63) 

J/Z 

12,[l:r,:] ^ + JjE n . f E Bri + E m , 

(64) 

j 2 t z 

ll,[l:f,:] ^ + ’-^2 Bfi.f = B R2 + £ K2 . 

(65) 


Since 


— Bl,tKi i) 1 JjBrt f 

=U 21 E 1 V^ :f] K 1 

we can characterize A 22 , A 22 by these new notations as 


( 66 ) 


m 


A 22 — h'bi Sj Vi/] + U 22 ll 2 V22 — B ta B M] Bft\ + t/ 22 L 2 V/ 2 


(67) 


A 2 2 —^21,[:,l:r ! ]-^ 11 ^ 1: r,l : r]^12,[l:7 ! ,:] ^21,[:,l:r ]K (<^ T ^ll,[l:f,l:r]-K") ^ T ^12,[l:ry] 

— (^21,[l:r]-^l + ^21,[l:r]-^2) (^1 ,l:f] K \ + J 2 -Zll,[l:f,l:f]-^ 2 ) (<^l ^12,[l:r] + /J^12,[l:f]) 

2 

^(£? Lfe + £ Lfc )(5 Mfc + L^)" 1 ^ + E Rk ) (68) 


k=l 


4. We now establish a number of bounds for the terms on the right hand side of (60)-(65). 
Lemma 8 Based on the assumptions above, we have 

^min (Bmi) > 3.43cr r . + i(A); 

. /r?«9.i=i 

II R R — 1 II 


y3825/3824 

-77TT’ 11 M l Bri 11 < 


v / fh859a m i n (L r 11 ) : 


^3825/3824 


V u.wy«/v/ ImI 1 yw li y V 

|| -'4_ max(r) || qi ||^Lt||g ^ || A_ max(r) || qi ||-^i?t||g ^ || A _ m ax(r) || qi 

\\(B L2 + E L2 )(B M2 + Sir’ll < r fi + 1 _ 1 1 /3 43 + 


2 y3825/3824 

II 9 — / n gc -n -ll^-max(r)||<?- 

y 0.859cr min (Vii) 



(69) 

4 

n)’ 

(70) 

* = 1,2, 

(71) 

. 1 ) 
3.43 J ’ 

(72) 


(73) 
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The proof of Lemma [8] is given in the Supplement. 


5. We finally give the upper bound of \\A 2 2 — ^- 22 ||g- By (67) and (68), we can split the loss as, 
A 2 2 — A 2 2 = ((Bli + E L i) ( B M i + E m 1 ) 1 (B R 1 + E r 1 ) — BuB^Bri) 


+ ( Bl2 + El2 ) ( Bm2 + Em2) 1 (Br 2 + Er 2 ) — 

We will analyze them separately. First, || L/22S2|| g < ||yf_ max (r)|| g ; second, 

\\(Bl2 + Ei2)(Bm2 + Em2) 1 (B R2 + EM2)\\q 


( 74 ) 


<\\(Bl2 + E L2 )(B M 2 + E M2 )- l \\ • (\\B R 2\\ q + \\E M 2\\ q ) 

/ a/ 3825/3824 i \\ / 2^3825/3824 

^ + /43 + T43J J \T059a min (V7i) + J - max(r)l1 ^ 

< ( Tft + —- /rr ^ + 0 . 412 ^) —-777 ^ + 1 ^] ||^ 4 -max(r)||q- 


(75) 


^min (Ul 1 ) ' y \ ^min (Yn) 

The analysis of (( B L1 + E JA ) (B M1 + E Mi y L (. B R1 + E R1 ) - B L1 By\B R1 ) is similar to the 
proof of Theorem [[} We have 

|| (-Bli + Eh)(Bmi + Emi ) 1 (Br\ + -Eri) — BuB^Bri || 



OO 



/ °o \ 

< 

Bja (B m \ E mi Y (— B m \E m1 ) 8 B m \ ) Bri 

+ 

En 



i= 0 

Q 


V *=o / 


Bl 1 B,-, 1 , ^(-BrnBi 1 ,)< B R1 


i=0 


+ 




i=0 


<I|BliB m \||||B m1 ||,^ ||B« 1 |li|B M 1 1 |r||B M \B R1 || + ||B R i|| a ^ ||Byj > ||£ M 1 || i ||B J ,‘ 1 B R1 | 


i=0 


i=0 


IIBiiBiAII £ IIBMilrilB^llilBmll, + ||B R1 || V || i+1 1|iS Rfl IPIIII, 


j=0 


i=0 


IIS 2 II, 


, ,,,, ri=nr (||B R! B Jf I 1 ||||B Jf I 1 B sl || + II b m\ b «iII + IIBuBm 1 ,!! + IIB^K+i^)) 

1 (Jr+l{A)\\B M1 \\ 

( 1.65 1.53 1.53 \ .. , 

^ ( Z - Iff W - nTW + T Tt/W + “-7t/W + °’ 42 ) ll^-max(r)||g- (76) 

\ < ^min(b'ii)cT m j n ( I'll) ^min(Fn) ^min(Fn) J 
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From 


(75), (76), (74), and the fact that <7 min (Uu) < 1 and Tr > — 1 f r 6 r . + 0.35, 

- - - x y 


11^22 — -^-22 || q < ( 2.16 Tr + 


4.95 


&min(Ull) 

1 36 

< ( 2.16Tr + 4.31 * 


+ 2.42 
+ 0.35 




"I - 1 ) 11-'4— max(r) | 


<6.5 Tr 


^min (Al) 
This concludes the proof. □ 


^min An) ' A ' V^niin An) 

T 1^ 11 A— max(r) 11 q • 


+ 1^ || A_ max(r) ||g (77) 


Proof of Lemma [7]. 

In order to prove this lemma, we just need to prove that the for-loop in Algorithm 2 will break 
for some s > r. This can be shown by proving the break condition 


A+s|l - IA21,[l:s]Aui:s,l: S ]H ^ T H 


hold for s = r. 


We adopt the definitions in (36), (37), (38), then we have 
Zn,[ l:r,l:r] = U^ r] A u V^. r] = M\ A U N 

= M t Uu^MN + M J U 12 T 2 V? 2 N 
= B m + Em, 

Z 2 i,[:,i:r] = A 21 V$ :r] = (U 2 Al A + U 22 T 2 V? 2 ) N = B L + E L . 

Hence, 

^21,[:,l:r] Ap[l:r,l:r] HI (A. + E L )(B M + E M ) X || 



oo 


oo 

< 

BlB- m 1 Y,(-BmB- u J 

+ 

£u+£(-^m)‘ 


7=0 


7=0 




1 — I \EmB K j 

6DJZ°) / A 45 / 44 45cr r+1 (A) 

l O'min(^ll) 44cr r (A)cr m in(C/ii) ^min All) J 1 - 

1 36 

<--y—— + 0.35 < Tr, 

Omiiift/ n ) 


(78) 


_45oy + i(+)_ 

44(T r (A)<T m i n (J7ll)<T m in(Vll) 
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which finished the proof of the lemma. □ 


Proof of Lemma 


First, since M z G M rxr and Nz G W xr are an orthonormal basis of Bm,t and B J M ~, we have 
Pm z = M Z M Z and Pn z = N Z N Z and 


^min (Bmi) ^min (JJBm^K i) 

^"min (■ JlMzM\B M ,rN z NlK i) 

^min (■ M z Bm,?N z ) ^min (NIK i) 

J 58 I)J 59 I J 53 I n «c;q . 

M: 0.859 a r (B Mf ) V ——- J ( i 111 „ 1 l l',, J V 3.43<r r+1 (8l). 


(79) 


which gives (69). 


BliB m \\\ — Bl?K 1 (, J\Bm,?K\) 


-1 


C21E1 VJ Ki (j; U, , vj : % 2., 


A' 


-1 


< 


C 21 

1 


11 


-1 


(80) 


< 


a^JJPMziU^Un)) a min ((JjM z )(M^g T . ] C/ 11 )) 
1 1 £np} ^3825/3824 




v / a859a min (f/n) ’ 


which gives the hrst part of (70). Here we used the fact that XuV^Vj y^Ai is a square 
matrix; Mz is the orthonormal basis of the column space of -Zippy,i:r] = U\\P\ Vj'','dj. 


Similarly we have the later part of (70), 


1 y3825/3824 

I iJ A/f i -D r?i II - 


(81) 


M1 R1 " - y0859<T m , n (V„) 

Based on the definitions, we have the bound for all “ E n terms in (60)-(65), i.e. (71). Now 


we move on to (72). By the SVD of -Z’n.piqur] (55) and the partition (56), we know 


{{Ji M'ZuwAKi k 2 \) 





L 

1 

1 

Ai 

0 



O 

1 

<1 



0 


-1 
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. -1 


Hence, we have 

|| {3^2 + El2)(Bm2 + Em2) 1 || = -^21,[:,l:r]-^2 (<^2-^11,[l:r,l:r]-^2) 

Z21,[:,l:r]\Kl Kq\ ([Ji </2] T -^ll,[l:f,l:r] [K\ A 2 ]) ~ -^21,[l:f] K\ (jJZn,[l:r ,l : f] K 1 ) 

Z 2 I ,[:,l:f] (■^ll,[l:f,l:f]) + ||(-Bl1 + Eli)(B || 

<Tr + 


-1 


< 


oo 


OO 

Bta ■ B M1 y /— EmiB m1 ) 

+ 


i=0 


i=0 


<Tr+(\\b l1 

Bmi II+ 11^11111^11) 
HI ( a/3825/3824 1 


1-||£ M1 ||||5,/|| 


< T fi + 


\ v v / 0+59a min ([/n) + 3.43 ) ' 1 - 1/3.43’ 


(82) 


which proves (72). Since .Ziiji^nf] = Bm,? + Em,? and by definition, rank(5^) < r, by 
Lemma [lj we know 

< ^r’+j(-^ll,[l:f ! ,l:f]) + &i(,EM,r ^)j — + (83) 

Then 

H-B^llg <||5 M2 + EM2\\q + ||5m 2 || 9 < ||+2^ ll,[l:r,l:r]A -2 || g + 11 -E’a/ 2 11g 


A 


^ a i(E ll,[l:f,l:f]) + ll-^A^Hg < 


i=r -\-1 

— 11 -^M,r 11 g “1“ ||-®M2||g 2|| A_ max ( r j|| 9< 


A ^2 cr i(E M ,r) + 11 E M 2 11 g 
\ * =1 


(84) 


Same to the process of (80), we know 

1 


< 


a/3825/3824 


^ m in(^ : %] Afi) 70^59^/Hn)' 

Also, 11 V] x 11 < 1. Hence, 

=114 y‘ 2 5| e, ,e, (v,\ vj% K,) (V£ if,) - 1 r 2 \ II, 

<I|Bm 2||, • II WiVj. ( i i? fl ir 1 ) _1 || • ||V3|| 

2^3825/3824 


(85) 


( 86 ) 


v / +859a min (H 11 ) 


|| A_ max(r) ||c 
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which proves (73). □ 


Proof of Theorem [3[ 

The idea of proof is to construct two matrices A^\ both in J r c {Mi 1 M 2 ) such that they 
have the identical first mi rows and m 2 columns, but differ much in the remaining block. 
Suppose a,b,c > 0 are fixed numbers, £ is a small real number. We first consider the following 
2-by-2 matrix 


B(e) = 


a c 


(87) 


b +£ 

a 

Suppose the larger and smaller singular value of B(e) are A max (e) and A m i n (e), then we have 

y 7 (a 2 + b 2 )(a 2 + c 2 ) 


Amax(^) — > ||S(0)|| — 


as e —> 0; since A max (e) ■ A m ; n (e) = |det(£>)| = a|e|, we also have 


Amin(^)/|^| t 


( 88 ) 


\J (a 2 + b 2 )(a 2 + c 2 ) 


(89) 


as e —* 0. If B(e) defined in ( ]87j ) has SVD 
B{e) = 

then we also have 


Mil 

Ul2 


^21 

U-21 



A i n a v ( £ ) 


0 A m j n (£() 



Vll 

Vl2 


_V2\ 

V 2 1_ 


(90) 


Mil 


Va 2 + b 2 


,«21 —$■ 


Va 2 + b 2 


, Mu —$■ 




a c 

, V 2 1 




(91) 


IS 


as e —* 0. 

Now we set a = 1, b = sjl — M 2 /M\ — r), c — \/l — M|/M 2 — rj, d = be/a, where r/ i 
some small positive number to be specify later. We construct An, 7L 12 , A 2i , A^ 2 and A^ 2 / such 
that, 


An — 


al r 

0 

, -Ai2 — 

cl r 

0 

1 

A 21 — 

bl r 

0 

0 

0 


0 

0 

mix(p2—rri2) 


0 

0 

L 


mi xrri 2 







(92) 


(pi— mi) X m2 


21 
































( 93 ) 


# _ 
^22 _ 


(d + e)I r 0 
0 0 


A 2 ) _ 

^22 ~ 


(pi—mi)x(p 2 —m 2 ) 


(d — e)I r 0 
0 0 


(pi—mi)x(p 2 —m 2 ) 


A« = 

An 

Ai 2 

, A® = 

An 

Ai 2 


A 21 

j(D 

^22 


A 2 i 

A (2) 
^22 


Here we use I r to note the identity matrix of dimension r. Then we construct A*A and A^ 
as 

4 .. 4.„1 [ 4 ,, 4 .„ 

(94) 

where A ^ and A^ are with identical first m\ rows and m 2 columns. Since the SVD of B(e ) 
is given as (90), the SVD of A^ can be written as 

77W tt ^~\ IV: W n 1 T T/^f 1 ) 

A™ = 


r tjw 

r/ (1) 1 
L 12 


sS 1} 

1 

0 


1 

v (1) l 

v 12 

_1 

7y(!) 
( - y 22 _ 


0 

1 

w 


v m 
_ V 21 

-1 

rV<N 


where 


U u = 


Unlr 

0 



Ul2 Ir 


U‘2 1 Ir 


U 22 I r 

, Ui 2 = 


, U 2 1 = 


, U 22 = 



0 


0 


0 

miXr 


mi xr 


(pi-mi)xr 

L J 


(pi-mi)xr 



VlJr 


Vl2 Ir 


V 2 1 I r 


v 22 I r 

Vn = 


, Vi 2 = 


, v 21 = 


, V 22 = 



0 


0 


0 


0 



777-2 Xr 


777-2 Xr 


(P2—m 2 )xr 

L J 


J (P2— m 2 )xr 


Dl Vax(^)fn D 2 A m i n (£)/ r . 


Hence, 


^11 




Va 2 + b 2 ( yi 

1 + I v - V 


0"min(hli) — V U — 


Mi 

1 


—>■ 


Va 2 + c z „ . / yrnTf 


1 + 


m 2 


V 


l> M 1? 8.S £T —^ 0 


M 21 as —)• 0. 


Also, ||S V || —> 0 as e —> 0. So we have A^ G J>(Mi,M 2 ) when e is small enough. Similarly 
A 1 ^ G J>(Mi,M 2 ) when e is small enough. Now we also have HA^/J^ = (q , A m i n (e)' ? ) 1,/</ = 
Q ,1/ ' ? A min (e), ||A^ liax(r) || g = (gA min (-e) 9 ) 1/</ = q l/q A min (—e). ||A^? — A^Hg = (<?( 2 kl) <? ) 1 ' /</ = 
2|e|g 1 /h 
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Finally for any estimate A 22 , we must have 
11^22 — ^ 2 ^ 11 ? 11^22 — ^. 22^11 


max 


|p (1) , JL ' |U W f JL 

11 — max(r)iiy n — max(r)iiy 


( 2 ) 


> 


A 2 2 ~ A 22 ) — (A-22 ~ A 22 


mm 


— max(r) I1 Q 5 IP — max(r) II9 


( 2 ) 


> 


2|e| 


2 min {A min (£), A min ( £)} 


{§9} v/(a 2 + 6 2 )(a 2 + c 2 ) 


( 95 ) 


A 




M, 


M 2 


as e —» 0. Since A*' 1 ), P 2 ) e J>(Mi, M 2 ) and are with identical first mi rows and m 2 columns, 
we must have 


inf sup 


|p22 — ^22| 


> 


\ 


A 22 A£-F r (Mi,M 2 ) IP- max(r) ||g 

Let r) —> 0, since Mi, M 2 < 1, we have 

|| P-22 — -'A-22 || q 


, Jl - M 2 \ / , Pi - M 2 

i + ( V „ 1 -P 2 P + ( V „ 2 -v) 2 ■ 


Mi 


M 2 


inf sup 


> 


1 


A 22 AeT r (Mi,M 2 ) IP— max(r) ||? M 1 M 2 4 \M] / \M 2 ^ 


(96) 


which finished the proof of theorem. □ 


Proof of Corollary [l| 

We first prove the second part of the corollary. We set a = (136/165) 2 . Since C/[ : ,i :r -] £ ^ P1> 
is with orthonormal columns, by Lemma [5] and 

mi > 12.5Wj 1 V(logr + c) > —— ^ ■ W'J 1 V(logr + c), 


(i-ay 


we have 


Imni 

O'min(L'll) — 0’min(O[f2i,l:rl) Ai \ 

V Pi 


(97) 


with probability at least 1 — 2exp(—c). When (97) holds, by the condition, we know 


1 / 777/1 1 1 

H"r+l(kl) 0"r( y 4) f l"min(l / ll) V-\ / A; ^(^IpmirP ll)y " 0’min(77n) A ~CT r (A)u m i n (VAlpmin(f7ll) • 

5 y pi 5va 4 
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When Tji > 2^/pi/mi, we have 

1.36 


+ 0.35 < 1.36 


— + 0.35 <2 ,/— < T r 

ami V m i 


OVnin (U\ I 

Hence we can apply Theorem [2j for 1 < q < oo we must have 


A 22 — A 


22 


— 6.5 Tr +4_ max ( r ) , 

<7 " V^minlVll) 


+ 1 


( 98 ) 


which finishes the proof of the second part of Corollary [TJ Besides, the proof for the third 
part is the same as the second part after we take the transpose of the matrix. 

For the first part, the proof is also similar. Again we set a = (136/165) 2 . Then we have 


m i > 


W^r(\ogr + c), m 2 > — N9 W,) 2 V(logr + c), 


so 


(1-a) 2 r v & z - (1 -a) 2 r 

/ Ciffl ^ I 


am 2 

P2 


(99) 


with probability at least 1 — 4exp(—c). When (99) holds, we have 


1 / TYb 2 | | 

&r+i(A) + cr r (A) —< / - - + a r (A)- cr m i n ([/n)(J m i n (Hii) + — (X r (A)cr m i n (l n)cr m i n (t/n). 

6 V P 1 P 2 6a 4 


When Tr = 2ijpi/mi or T c = 2^p 2 /m 2 , similarly to the first part we have 

1.36 „ 1.36 


@ min ( U\ 1 ) 

Hence we can apply Theorem [2] and get 


0.35 < Tr, or 


0"min(l / li 


+ 0.35 < T c . 


A 22 — A 


□ 


22 


<6.5T R ||A_ max(r )| 


0"min(ffil) 


+ 1 ) < 6.5-2 J— 
V mi 


P 2 

am 2 


T f ) ||A_ max ( r )| 


<29|| A_ max (r) \\q^j 


P 1 P 2 

m x m 2 


Proof of Corollary [2} 

Suppose 0 < ai < 1, since C7[ : , 1:r -] € M is with random orthonormal columns of Haar measure, 
we can apply Lemma [6] and find some c > 0 and 5 > 0 such that when pi > mi > cr, 

136 I TYl 

O’min(Cll) = O r m i n (f/[i ;rni i :r l) + \ / (100) 

165 v Pi 
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with probability at least 1 — exp(— 8mi). When (100) happen, we have 


1 / 777-1 

&r- t-l(^) — 0’ r (^4)(7 m in(l ll) p \ ^ (^4)^"min (1 11 )c r min(h^ll), 

5 V Pi 

1.36 165 / pi I pi 

CTmin{Un) 136 V mi V m i 


Hence we can apply Theorem [2j for 1 < q < oo, we have 


A 2 2 — A 2 2 


< 6.5 T r L4_, 


R || /i -max(r)|L I /y \ 

V^minf * 11 ) 

which hnishes the proof of the corollary. □ 


+ 1 > 


( 101 ) 


J.PN - 
^max 


3.1 Description of Cross-Validation 

In this section, we describe the cross-validation used in penalized nuclear norm minimization 
Q in the numerical comparison in Sections [I] and [5] 

First, we construct a grid T of non-negative numbers based on a pre-selected positive 
integer N. Denote 

An Ai 2 

A 2 1 0 

i.e. the largest singular value of the observed blocks. For penalized nuclear norm minimization, 

we let T = (f™ €1 ■ 10~ 3(1/iV) , - , ££ ■ lO^C^)}. 

Next, for a given positive integer K, we randomly divide the integer set {1, • • • , mi} into 
two groups of size m ^ ~ (A"-i)r^ m ( 2 ) ~ a f or pp times. For h = l,--- ,H, we denote 
by ,/f 1 and J 2 C (1,2, ••• , mi} the index sets of the two groups for the h -th split. Then 
the penalized nuclear norm minimization estimator Q is applied to the first group of data: 
Au, A 2 i, (^ 12 )i.e. the data of the observation set D = {(i,j) : 1 < j < m 2 , or i 6 
Ji,m 2 + 1 < j < p 2 }, with each value of the tuning parameter t 6 T and denote the result 
by A^ N (t). Note that we did not use the observed block -l[j^( m2 +i) :p2 ] hi calculating A^ N (t). 
Instead, A^jh^ m2+] y p ^ is used to evaluate the performance of the tunning parameter t e T. 
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Set 


m = ^t 

h= 1 


a pn 

A h 


(t) 


[j£,(m 2 +l):p 2 ] 


A\ jh 


[J^,(m 2 +l):p2] 


( 102 ) 


Finally, the tuning parameter is chosen as 


= argmini?(t) 

teT 

and the final estimator A PN is calculated using this choice of the tuning parameter f*. 

In all the numerical studies with penalized nuclear norm minimization in Sections [4] and 
[5j we use 5-cross-validation (i.e., K — 5), N — 10 to select the tuning parameter. 
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